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A common economic occurrence is the following: Two parties, principal and 
agent, are in a situation — typically of their choosing — in which actions by the 
agent impose an externality on the principal. Not surprisingly, the principal 
will want to influence the agent's actions. This influence will often take the 
form of a contract that has the principal compensating the agent contingent 
on cither his actions or the consequences of his actions. Table 1 lists some 
examples of situations like this. Note that, in many of these examples, the 
principal is buying a good or service from the agent. That is, many buyer-seller 
relationships naturally fit into the principal-agent framework. This note covers 
the basic tools and results of agency theory in this context. 



Table 1: Examples of Mora I- Hazard Problems 



Principal 


Agent 


Problem 


Solution 


Employer 


Employee 


Induce employee to take 
actions that increase 
employer's profits, but 
which he finds personally 
costly. 


Base employee's 
compensation on employer's 
profits. 


Plaintiff 


Attorney 


Induce attorney to expend 
costly effort to increase 
plaintiff's chances of 
prevailing at trial. 


Make attorney's fee 
contingent on damages 
awarded plaintiff. 


Homeowner 


Contractor 


Induce contractor to 
complete work (e.g., 
remodel kitchen) on time. 


Give contractor bonus for 
completing job on time. 


Landlord 


Tenant 


Induce tenant to make 
investments (e.g., in time or 
money) that preserve or 
enhance property's value to 
the landlord. 


Pay the tenant a fraction of 
the increased value (e.g., 
share-cropping contract). 
Alternatively, make tenant 
post deposit to be forfeited 
if value declines too much. 



To an extent, the principal-agent problem finds its root in the early literature 
on insurance. There, the concern was that someone who insures an asset might 
then fail to maintain the asset properly (e.g., park his car in a bad neighbor- 
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hood). Typically, such behavior was either unobservable by the insurance com- 
pany or too difficult to contract against directly; hence, the insurance contract 
could not be directly contingent on such behavior. But because this behavior — 
known as moral hazard — imposes an externality on the insurance company (in 
this case, a negative one), insurance companies were eager to develop contracts 
that guarded against it. So, for example, many insurance contracts have de- 
ductibles — the first k dollars of damage must be paid by the insured rather 
than the insurance company. Since the insured now has $fc at risk, he'll think 
twice about parking in a bad neighborhood. That is, the insurance contract 
is designed to mitigate the externality that the agent — the insured — imposes 
on the principal — the insurance company. Although principal-agent analysis is 
more general than this, the name "moral hazard" has stuck and, so, the types 
of problems considered here are often referred to as moral-hazard problems. A 
more descriptive name, which is also used in the literature, is hidden- action 
problems. 1 

As we've already suggested, the principal- agent model with hidden action 
has been applied to many questions in economics and other social sciences. Not 
surprisingly, we will focus on well-known results. Nonetheless, we feel there's 
some value added to this. First, we will establish some notation and standard 
reasoning. Second, we will focus on examples, drawn from industrial organiza- 
tion and the theory of the firm, of interest in the study of strategy. Finally, we 
offer our personal opinions on the achievements and weaknesses of the moral- 
hazard model. 

1 The Moral-Hazard Setting 

Let us first give a general picture of the situation we wish to analyze in depth. 

1 . Two players are in an economic relationship characterized by the following 
two features: First, the actions of one player, the agent, affect the well- agent 
being of the other player, the principal. Second, the players can agree principal 
ex ante to a reward schedule by which the principal pays the agent. 2 

The reward schedule represents an enforceable contract (i.e., if there is a 
dispute about whether a player has lived up to the terms of the contract, 
then a court or similar body can adjudicate the dispute). 

2. The agent's action is hidden; that is, he knows what action he has taken 
but the principal does not directly observe his action. [Although we will 
consider, as a benchmark, the situation in which the action can be con- 
tracted on directly] Moreover, the agent has complete discretion in choos- 
ing his action from some set of feasible actions. 3 

1 Other surveys include Hart and Holmstrom (1987), to whom we are greatly indebted. The 
books by Salanic (1997) and Macho-Stadlcr and Pcrcz-Castrillo (1997) also include coverage 
of this material. 

2 Although we typically think in terms positive payments, in many applications payments 
could be negative; that is, the principal fines or otherwise punishes the agent. 

3 Typically, this set is assumed to be exogenous to the relationship. One could imagine, 
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3. The actions determine, usually stochastically, some performance measures. 
In many models, these are identical to the benefits received by the prin- 
cipal, although in some contexts the two are distinct. The reward sched- 
ule is a function of (at least some) of these performance variables. In 
particular, the reward schedule can be a function of the verifiable perfor- 
mance measures (recall that information is verifiable if it can be observed 
perfectly — without error — by third parties). 

4. The structure of the situation is common knowledge between the players. 

For example, consider a salesperson who has discretion over the amount of 
time or effort he expends promoting his company's products. Much of these 
actions are unobservable by his company. The company can, however, measure 
in a verifiable way the number of orders or revenue he generates. Because these 
measures are, presumably, correlated with his actions (i.e., the harder he works, 
the more sales he generates on average), it may make sense for the company to 
base his pay on his sales — put him on commission — to induce him to expend 
the appropriate level of effort. 

Here, we will also be imposing some additional structure on the situation: 

• The players are symmetrically informed at the time they agree to a reward 
schedule. 

• Bargaining is take-it-or-lcavc-it: The principal proposes a contract (reward 
schedule), which the agent either accepts or rejects. If he rejects it, the 
game ends and the players receive their reservation utilities (their expected 
utilities from pursuing their next best alternatives). If he accepts, then 
both parties are bound by the contract. 

• Contracts cannot be renegotiated. 

• Once the contract has been agreed to, the only player to take further 
actions is the agent. 

• The game is played once. In particular, there is only period in which 
the agent takes actions and the agent completes his actions before any 
performance measures are realized. 

All of these are common assumptions and, indeed, might be taken to constitute 
part of the "standard" principal-agent model. 

The link between actions and performance can be seen as follows. Perfor- 
mance is a random variable and its probability distribution depends on the ac- 
tions taken by the agent. So, for instance, a salesperson's efforts could increase 
his average (expected) sales, but he still faces upside risk (e.g., an economic 
boom in his sales region) and downside risk (e.g., introduction of a rival prod- 
uct). Because the performance measure is only stochastically related to the 
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action, it is generally impossible to perfectly infer the action from the realiza- 
tion of the performance measure. That is, the performance measure does not, 
generally, reveal the agent's action — it remains "hidden" despite observing the 
performance measure. 

The link between actions and performance can also be viewed in a indirect 
way in terms of a state-space model. Performance is a function of the agent's 
actions and of the state of nature; that is, a parameter (scalar or vector) that 
describe the economic environment (e.g., the economic conditions in the sales- 
person's territory). In this view, the agent takes his action before knowing the 
state of nature. Typically, we assume that the state of nature is not observable 
to the principal. If it she could observe it, then she could perfectly infer the 
agent's action by inverting from realized performance. In this model, it is not 
important whether the agent later observes the state of nature or not, given he 
could deduce it from his observation of his performance and his knowledge of 
his actions. 

There is a strong assumption of physical causality in this setting, namely 
that actions by the agent determine performances. Moreover, the process is 
viewed as a static production process: There are neither dynamics nor feed- 
back. In particular, the contract governs one period of production and the 
game between principal and agent encompasses only this period. In addition, 
when choosing his actions, the agent's information is identical to the principal's. 
Specifically, he cannot adjust his actions as the performance measures are real- 
ized. The scquentiality between actions and performance is strict: First actions 
are completed and, only then, is performance realized. This strict scquential- 
ity is quite restrictive, but relaxing the model to allow for less rigid a timing 
introduces dynamic issues that are far more complex to solve. We will simply 
mention some of them in passing here. 



2 Basic Two-Action Model 

We start with the simplest principal-agent model. Admittedly, it is so simple 
that a number of the issues one would like to understand about contracting 
under moral hazard disappear. On the other hand, many issues remain and, for 
pedagogical purposes at least, it is a good place to start. 4 

2.1 The two-action model 

Consider a salesperson, who will be the agent in this model and who works for a 
manufacturer, the principal. The manufacturer's problem is to design incentives 
for the salesperson to expend effort promoting the manufacturer's product to 
consumers. 

4 But the pedagogical value of this section should not lead us to forget caution. And caution 
is indeed necessary as the model oversimplifies reality to a point that it delivers conclusions 
that have no match in a more general framework. One could say that the two-action model 
is tailored so as to fit with naive intuition and to lead to the desired results without allowing 
us to see fully the (implicit) assumptions on which we are relying. 
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Let x denote the level of sales that the salesperson reaches within the period 
under consideration. This level of sales depends upon lots of demand parameters 
that are beyond the salesperson's control; but, critically, they also depend upon 
the salesperson's efforts — the more effort the salesperson expends, the more con- 
sumers will buy in expectation. Specifically, suppose that when the salesperson 
does not expend effort, sales are distributed according to distribution function 
Fo(-) on R + . 5 When he does expend effort, sales are distributed fi(-). Observe 
effort, here, is a binary choice. Consistent with the story we've told so far, we 
want sales to be greater, in expectation, if the salesperson has become informed; 
that is, we assume 6 



p + OO p+oo 

Ei [a;] ee / xdF (x) > / xdF^dx = E [a;] 
Jo Jo 



(1) 



Having the salesperson expend effort is sales-enhancing, but it is also costly 
for the salesperson. Expending effort causes him disutility C compared to no 
effort. The salesperson has discretion: He can incur a personal cost of C and 
boosts sales by choosing action a = 1 (expending effort), or he can expend 
no effort, action a = 0, which causes him no disutility but does not stimulate 
demand either. Like most individuals, the salesperson is sensitive to variations 
of his revenue, so that his preferences over income exhibit risk aversion. We 
also assume that his utility exhibits additive separability in money and action. 
Specifically, let his utility be 

U(s, x, a) — u(s) — aC, 

where s is a payment from the manufacturer and u(-) is strictly increasing and 
concave (he prefers more money to less and is risk averse). Of course, the 
salesperson could also simply choose not to work for the manufacturer. This 
would yield him an expected level of utility equal to Ur. The quantity Ur is 
the salesperson's reservation utility. 

The manufacturer is a large risk-neutral company that cares about the sales 
realized on the local market net of the salesperson's remuneration or share of 
the sales. Hence, its preferences are captured by the utility (profit) function: 

W(s, x, a) = x — s. 

Assume the manufacturer's size yields it all the bargaining power in its negoti- 
ations with the salesperson. 

Suppose, as a benchmark, that the manufacturer could observe and establish 
whether the salesperson had expended effort. We will refer to this benchmark 
as the full or perfect information case. Then the manufacturer could use a 
contract that is contingent on the salesperson's effort, a. Moreover, because 

5 Treating the set of "possible" sales as [0, oo) is without loss of generality, since the bound- 
edness of sales can be captured by assuming F (x > x) = for some x < oo. 

6 Observe that this notation covers both the case in which F a (•) is a differentiable distri- 
bution function or a discrete distribution function. 
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the salesperson is risk averse, while the manufacturer is risk neutral, it is most 
efficient for the manufacturer to absorb all risk. Hence, in this benchmark case, 
the salesperson's compensation would not depend on the realization of sales, 
but only on the salesperson's effort. The contract would then be of the form: 

so if a = 
Si if a = 1 

If the manufacturer wants the salesperson to expend effort (i.e., choose a = 1), 
then it must choose So and Si to satisfy two conditions. First, conditional on 
accepting the contract, the salesperson must prefer to invest; that is, 



u (si) — C >u (so) . 



(IC) 



A constraint like this is known as an incentive compatibility constraint (conven- 
tionally abbreviated IC): Taking the desired action must maximize the agent's 
expected utility. Second, conditional on the fact that he will be induced to ex- 
pend effort, he must prefer to sign the contract than to forgo employment with 
the manufacturer; that is, 

u( Sl )-C>U R . (IR) 

A constraint like this is known as an individual rationality constraint (conven- 
tionally abbreviated IR) . The IR constraint is also referred to as a participation 
constraint. Moreover, in selecting s and s 1; the manufacturer wants to maxi- 
mize its profits conditional on gaining acceptance of the contract and inducing 
a = 1. That is, it wishes to solve 
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maxEi [x] 

so, si 



Si 



subject to the constraints (IC) and (IR). If we postulate that 

< 



u(s) 
u{s)-C 



Ur and 
U R 



(2) 



both have solutions within the domain of u(-), then the solution to the man- 
ufacturer's problem is straightforward: si solves (2) and s is a solution to 
u(s) < Ur. It is readily seen that this solution satisfies the constraints. More- 
over, since u (•) is strictly increasing, there is no smaller payment that the 
manufacturer could give the salesperson and still have him accept the contract; 
that is, the manufacturer cannot increase profits relative to paying si by making 
a smaller payment. This contract is known as a forcing contract. For future 
reference, let sf be the solution to (2). Observe that sf = u^ 1 {Ur, + C), where 
u^ 1 (•) is the inverse function corresponding to u (•). 
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Technical Aside 

The solution to the manufacturer's maximization problem depends on the domain and 
range of the utility function if (•). Let T>, an interval in R, be the domain and 1Z be 
the range. Let s be inf T> (i.e., the greatest lower bound of V) and let s be sup V (i.e., 
the least upper bound of V). As shorthand for lim 3 j^ u(s) and lim s f S u (s), we'll write 
u(s) and u(s) respectively. If u(s) — C < Ur for all s < s, then no contract exists that 
satisfies (IR). In this case, the best the manufacturer could hope to do is implement 
a — 0. Similarly, if u(s) — C < u(s), then no contract exists that satisfies (IC). The 
manufacturer would have to be satisfied with implementing a — 0. Hence, a = 1 can 
be implemented if and only if u(s) — C > max{UR, u(s)}. Assuming this condition is 
met, a solution is so I s and si solving 

u(s) — C > max{£/fi, u(s)}. 

Generally, conditions are imposed on u(-) such that a solution exists to u(s) < Ur and 
(2). Henceforth, we will assume that these conditions have, indeed, been imposed. 
For an example of an analysis that considers bounds on V that are more binding, see 
Sappington (1983). 

Another option for the manufacturer is, of course, not to bother inducing the 
salesperson to expend effort promoting the product. There are many contracts 
that would accomplish this goal, although the most "natural" is perhaps a non- 
contingent contract: sq = si. Given that the manufacturer doesn't seek to 
induce investment, there is no IC constraint — the salesperson inherently prefers 
not to invest — and the only constraint is the IR constraint: 

u{s ) > U R . 

The expected-profit-maximizing (cost-minimizing) payment is then the smallest 
payment satisfying this expression. Since u(-) is increasing, this entails so = 
u^ 1 {Ur). We will refer to this value of so as Sq . 

The manufacturer's expected profit conditional on inducing a under the op- 
timal contract for inducing a is E a [x] — . The manufacturer will, thus, prefer 
to induce a = 1 if 

Ei [x]-s( >E [x]-s%. 

In what follows, we will assume that this condition is met: That is, in our 
benchmark case of verifiable action, the manufacturer prefers to induce effort 
than not to. 

Observe the steps taken in solving this benchmark case: First, for each 
possible action we solved for the optimal contract that induces that action. Then 
we calculated the expected profits for each possible action assuming the optimal 
contract. The action that is induced is, then, the one that yields the largest 
expected profit. This two-step process for solving for the optimal contract is 
frequently used in contract theory, as we will see. 

2.2 The optimal incentives contract 

Now, we return to the case of interest: The salesperson's (agent's) action is 
hidden. Consequently, the manufacturer cannot make its payment contingent 
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on whether the salesperson expends effort. The one remaining verifiable variable 
is performance, as reflected by realized sales, x. A contract, then, is function 
mapping sales into compensation for the salesperson: s — S(x). Facing such 
a contract, the salesperson then freely chooses the action that maximizes his 
expected utility. Consequently, the salesperson chooses action a = 1 if and only 
if: 7 

Ei[«(S(x))]-C>Eo[«(S(a;))]. (IC) 

If this inequality is violated, the salesperson will simply choose not to get in- 
formed. Observe that this is the incentive compatibility (IC) constraint in this 
case. 

The game we analyze is in fact a simple Stackelberg game, where the man- 
ufacturer is the first mover — it chooses the payment schedule — to which it is 
committed; and the salesperson is the second mover — choosing his action in 
response to the payment schedule. The solution to 

maxE„ [u{S{x))\ - aC 

a 

(with tics going to the manufacturer — see footnote 7) gives the salesperson's 
equilibrium choice of action by the agent as a function of the payment function 
S(-). Solving this contracting problem then requires us to understand what kind 
of contract the manufacturer could and will offer. 

Observe first that if she were to offer the fixed-payment contract S (x) = Sq 
for all x, then, as above, the agent would accept the contract and not bother 
acquiring information. Among all contracts that induce the agent to choose 
action a = in equilibrium, this is clearly the cheapest one for the manufacturer. 
The fixed-payment contract set at sf will, however, no longer work given the 
hidden-action problem: Since the salesperson gains sf whatever his efforts, he 
will choose the action that has lesser cost for him, a = 0. It is in fact immediate 
that any fixed-payment contract, which would be optimal if the only concern 
were efficient risk- sharing, will induce an agent to choose his least costly action. 
Given that it is desirable, at least in the benchmark full-information case, for 
the salesperson to expend effort selling the product, it seems plausible that the 
manufacturer will try to induce effort even though — as we've just seen — that 
must entail the inefficient (relative to the first best) allocation of risk to the 
salesperson. 

7 We assume, when indifferent among a group of actions, that the agent chooses from that 
group the action that the principal prefers. This assumption, although often troubling to 
those new to agency theory, is not truly a problem. Recall that the agency problem is a game. 
Consistent with game theory, we're looking for an equilibrium of this game; i.e., a situation 
in which players are playing mutual best responses and in which they correctly anticipate 
the best responses of their opponents. Were the agent to behave differently when indifferent, 
then we wouldn't have an equilibrium because the principal would vary her strategy — offer a 
different contract — so as to break this indifference. Moreover, it can be shown that in many 
models the only equilibrium has the property that the agent chooses among his best responses 
(the actions among which he is indifferent given the contract) the one most preferred by the 
principal. 
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We now face two separate questions. First, conditional on the manufac- 
turer wanting to cause the salesperson to expend effort, what is the optimal — 
least-expected-cost — contract for the manufacturer to offer? Second, are the 
manufacturer's expected profits greater doing this than not inducing the sales- 
person to expend effort (i.e., greater than the expected profits from offering the 
fixed-payment contract S (x) = Sq)1 

As in the benchmark case, not only must the contract give the salesperson 
an incentive to acquire information (i.e., meet the IC constraint), it must also 
be individually rational: 

Ei [u(S(x))] -C>U R . (IR') 

The optimal contract is then the solution to the following program: 

max Ei [a; - S(x)] (3) 

subject to (IC') and (IR'). 

The next few sections will consider the solution to (3) under a number of different 
assumptions about the distribution functions F a (•). 

Two assumptions on u(-), in addition to those already given, that will be 
common to these analyses are: 

1. The domain of u (•) is (s, oo), s > — oo. 8 

2. lim s j^u(s) = — oo and lim^oo u (s) = oo. 

An example of a function satisfying all the assumptions on u(-) is ln(-) with 
s = 0. For the most part, these assumptions are for convenience and are more 
restrictive than we need (for instance, many of our results will also hold if 
u(s) — y/x, although this fails the second assumption). Some consequences of 
these and our earlier assumptions are 

• u(-) is invertible (a consequence of its strict monotonicity) . Let u^ 1 (•) 
denote its inverse. 

• The domain of u^ 1 (•) is K (a consequence of the last two assumptions). 

• u^ 1 (•) is continuous, strictly increasing, and convex (a consequence of the 
continuity of u(-), its concavity, and that it is strictly increasing). 

2.3 Two-outcome model 

Imagine that there are only two possible realizations of sales, high and low, 
denoted respectively by xr and xl, with xh > xl- Assume that 

F a (x L ) = k-qa, 

8 Sincc its domain is an open interval and it is concave, u (■) is continuous everywhere on 
its domain (see van Tiel, 1984, p. 5). 
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where q G (0, 1] and k G [q, 1] are known constants. 

A contract is sh — S (xh) and sl — S (xl)- We can, thus, write program 
(3) as 

max (1 + q — k) (x H - s H ) + (k - q) (x L - s L ) 

sh,sl 

subject to 

(l + q-k)u (s H ) + (k-q)u (s L ) - C > (1 - k) u (s H ) + ku (s L ) (IC) 

and 

(l + q-k)u (s H ) + {k-q) u (s L ) - C > U R (IR) 

We could solve this problem mechanically using the usual techniques for maxi- 
mizing a function subject to constraints, but it is far easier, here, to use a little 
intuition. To begin, we need to determine which constraints are binding. Is 
IC binding? Well, suppose it were not. Then the problem would simply be 
one of optimal risk sharing, because, by supposition, the incentive problem no 
longer binds. But we know optimal risk sharing entails sjj = s^; that is, a 
fixed-payment contract. 9 As we saw above, however, a fixed-payment contract 
cannot satisfy IC: 

u(s) - C < u(s) . 

Hence, IC must be binding. 

What about IR? Is it binding? Suppose it were not (i.e., it were a strict 
inequality) and let s* L and s* H be the optimal contract. Then there must exist 
an e > such that 

(1 + q - k) [u (s* H ) -£} + (k-q) [u (s* L ) -£}~C>U R . 

Let s n — u^ 1 [u (s* ) — e], n G {£, H}. Clearly, s n < s* for both n, so that the 
{§„} contract costs the manufacturer less than the {s*} contract; or, equiva- 
lent^, the {s n } contract yields the manufacturer greater expected profits than 
the {s* } contract. Moreover, the {s„} contract satisfies IC: 

(1 + q - k) u (S H ) + (k-q)u (s L ) - C 

> (l + q-k)[u(s H )-£] + (k- q) [u (s* L ) - £} - C 
= (l + q-k)u(s H ) + (k-q)u(s* L )-C-£ 

> (1 — k) u (s* H ) + ku (s* L ) — £ (since {s* } must satisfy IC) 
= (1 - k) u (s H ) + ku (s L ) ■ 

9 The proof is straightforward if u (•) is diffcrcntiablc: Let A be the Lagrange multiplier on 
(IR). The first-order conditions with respect to sj, and sjj are 

k - q- \{k- q)u' (s L ) = 

and 

1 + q - k - A(l + q - k)u' (s H ) = 0, 
respectively. Solving, it is clear that = sj^. The proof when «(•) is not (everywhere) 
diffcrcntiable is only slightly harder and is left to the reader. 
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But this means that {s n } satisfies both constraints and yields greater expected 
profits, which contradicts the optimality of {s*}. Therefore, by contradiction, 
we may conclude that IR is also binding at the optimal contract for inducing 

0=1. 

We're now in a situation where the two constraints must bind at the optimal 
contract. But, given we have only two unknown variables, sh and sl, this means 
we can solve for the optimal contract merely by solving the constraints. Doing 
so yields 

Sh = u- 1 (u R + ^cj and s L = u' 1 (u R - ^"^ C ') ' (4) 

Observe that the payments vary with the state (as we knew they must because 
fixed payments fail the IC constraint). 

Recall that were a verifiable, the contract would be S (x) = sf = %r x (Ur + C). 
Rewriting (4) we see that 

-i ( i v\ k — g \ , / , F , l + q — kJ\ 
Sh = u \u [s 1 ) H CI and sl — u I u [s 1 ) CI; 

that is, one payment is above the payment under full information, while the other 
is below the payment under full information. Moreover, the expected payment 
to the salesperson is greater than sf : 

(1 + g- k) s H + {k - g) s L 
= (l + g-fc) u -i(u(sf) + ^c) 

+ (k-q)u- 1 (u(s()- 1 + q q ~ k C 

(1 + q-k) (u(s() + k -^c) 
+ (k-q) („( s f)-l±S=*c) 
u-i[u(s[)]=s(; 



> u- 1 



(5) 



where the inequality follows from Jensen's inequality. 10 Provided the agent is 
strictly risk averse and q < k, the above inequality is strict: Inducing the agent 
to choose a = 1 costs strictly more in expectation when the principal cannot 
verify the agent's action. 

Before proceeding, it is worth considering why the manufacturer (princi- 
pal) suffers from its inability to verify the salesperson's (agent's) action (i.e., 



"Jensen's inequality for convex functions states that if g (•) is convex function on an interval 
of R (and equal to zero not on that interval), then E {g (X)} > g (EX), where X is a random 
variable and E is the expectations operator with respect to X (see, e.g., van Tiel, 1984, p. 11, 
for a proof). If g (•) is strictly convex and the distribution of X is not degenerate (i.e., does 
not concentrate all mass on one point), then the inequality is strict. For concave functions, 
the inequalities are reversed. 
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from the existence of a hidden-action problem). Ceteris paribus, the salesper- 
son prefers a = to a = 1, because expending effort is personally costly to 
him. Hence, when the manufacturer wishes to induce a = 1, its interests and 
the salesperson's are not aligned. To align their interests, the manufacturer 
must offer the salesperson incentives to choose a = 1. The problem is that 
the manufacturer cannot directly tie these incentives to the variable in which 
it is interested, namely the action itself. Rather, it must tie these incentives 
to sales, which are imperfectly correlated with action (provided q < k). These 
incentives, therefore, expose the agent to risk. We know, relative to the first 
best, that this is inefficient. Someone must bear the cost of this inefficiency. 
Because the bargaining game always yields the salesperson the same expected 
utility (i.e., IR is always binding), the cost of this inefficiency must, thus, be 
borne by the manufacturer. 

Another way to view this last point is that because the agent is exposed to 
risk, which he dislikes, he must be compensated. This compensation takes the 
form of a higher expected payment. 

To begin to appreciate the importance of the hidden-action problem, observe 
that 

lim (1 + q — k) s H + (k — q) s L = lim sh 

q\k q]k 

= sf. 

Hence, when q = k, there is effectively no hidden-action problem: Low sales, xl, 
constitute proof that the salesperson failed to invest, because Pr {x = Xl\cl = 1} = 
in that case. The manufacturer is, thus, free to punish the salesperson for low 
sales in whatever manner it sees fit; thereby deterring a = 0. But because there 
is no risk when a = 1 , the manufacturer does not have to compensate the sales- 
person for bearing risk and can, thus, satisfy the IR constraint paying the same 
compensation as under full information. When q = k, we have what is known as 
a shifting support. 11 We will consider shifting supports in greater depth later. shifting support 

n Thc support of distribution G over random variable X, sometimes denoted supp{X}, is 
the set of x's such that for all e > 0, 

G (x) — G (x - e) >0. 

Loosely speaking, it is the set of x's that have positive probability of occurring. For example, 
if X is the outcome of the roll of a die, then 5 S supp {X}, since 

G(5)-G(5-e) > i Ve>0. 

Whereas 5.5 ^ supp {X}, since 

G (5.5) - G (5.5 - e) = if e < .5. 
Likewise, if X is uniformly distributed on [0, 1], then any x g (0, 1] is in supp {X}, since 

G (x) — G (x — e) = min {x, e} . 
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To see the importance of the salesperson's risk aversion, note that were the 
salesperson risk neutral, then the inequality in (5) would, instead, be an equality 
and the expected wage paid the salesperson would equal the wage paid under 
full information. Given that the manufacturer is risk neutral by assumption, 
it would be indifferent between an expected wage of sf and paying sf with 
certainty: There would be no loss, relative to full information, of overcoming 
the hidden-action problem by basing compensation on sales. It is important to 
note, however, that assuming a risk-neutral agent does not obviate the need to 
pay contingent compensation (e.g., we still need sjj > sl) — as can be seen by 
checking the IC constraint; agent risk neutrality only means that the principal 
suffers no loss from the fact that the agent's action is hidden. 

We can also analyze this version of the model graphically. A graphical 
treatment is facilitated by switching from compensation space to utility space; 
that is, rather than put sl and sh on the axes, we put ul = u(sl) and uh = 
u (sh) on the axes. With this change of variables, program (3) becomes: 

max (1 + q — k) (x H - u^ 1 (%)) + (k — q) (x L - u^ 1 (u L )) 

U L ,U H 

subject to 

(1 + q - k) u H + {k - q) u L - C > (1 - k) u H + ku L and (IC") 
(1 + q - k) u H + {k - q) u L - C > U R (IR") 

Observe that, in this space, the salesperson's indifference curves are straight 
lines, with lines farther from the origin corresponding to greater expected utility. 
The manufacturer's iso-expected-profit curves are concave relative to the origin, 
with curves closer to the origin corresponding to greater expected profit. Figure 

1 illustrates. Note that in Figure 1 the salesperson's indifference curves and 
the manufacturer's iso-expected-profit curves are tangent only at the 45° line, a 
well-known result from the insurance literature. 12 This shows, graphically, why 
efficiency (in a first-best sense) requires that the agent not bear risk. 

We can re-express (IC") as 

C 

uh >u l + —. (6) 

Hence, the set of contracts that are incentive compatible lie on or above a line 
above, but parallel, to the 45° line. Graphically, we now see that an incentive- 
compatible contract requires that we abandon non-contingent contracts. Figure 

2 shows the space of incentive-compatible contracts. 

12 Proof: Let <j>{-) = u^ 1 (■). Then the MRS for the manufacturer is 

(k-q)<f>'(u L ) 
(l + q-k)4>' (u H )' 
whereas the MRS for the salesperson is 

(k-q) 
(1 + q-k)' 
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Retailer's 
indifference 
curves. His 
expected utility 
increases as 
curves shift out. 



45° 




Manufacturer's 
iso-expected- 



expected profit 
increases as 
curves shift in. 



profit curves. Its - fj\_ 



Figure 1: Indifference curves in utility space for the manufacturer (principal) and 
salesperson (agent). 



The set of individually rational contracts are those that lie on or above the 
line defined by (IR")- This is also illustrated in Figure 2. The intersection 
of these two regions then constitutes the set of feasible contracts for inducing 
the salesperson to choose a = 1. Observe that the lowest iso-expected-profit 
curve — corresponding to the largest expected profit — that intersects this set is 
the one that passes through the "corner" of the set — consistent with our earlier 
conclusion that both constraints are binding at the optimal contract. 

Lastly let's consider the variable q. We can interpret q as representing the 
correlation — or, more accurately, the informativeness — of sales to the action 
taken. 13 At first glance, it might seem odd to be worried about the informa- 

Since </>(■) is strictly convex, rf>' (u) = cf>' (v) only if u = v\ that is, the MRS's of the appropriate 
iso-curves can be tangent only on the 45° line. 

13 To see this, in a loose sense, suppose the salesperson is playing the mixed strategy in 
which he chooses a = 1 with probability fS S (0, 1). Then, ex post, the probability that he 
chose a = 1 conditional on x = x^j is 



by Bayes Theorem. It is readily seen that this probability is increasing in q; moreover, it 
reduces to — nothing has been learned — if q = 0. Similarly, the posterior probability of 



(l + q-k)P 



(l + ,-fc) j S + (l_fc)(l-/9) 
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Figure 2: The set of feasible contracts. 



tivcncss of sales since, in equilibrium, the principal can accurately predict the 
agent's choice of action from the structure of the game and her knowledge of the 
contract. But that's not the point: The principal is forced to design a contract 
that pays the agent based on performance measures that are informative about 
the variable upon which she would truly like to contract, namely his action. 
The more informative these performance measures are — loosely, the more cor- 
related they are with action — the closer the principal is getting to the ideal of 
contracting on the agent's action. 

In light of this discussion it wouldn't be surprising if the manufacturer's 
expected profit under the optimal contract for inducing a = 1 increases as q 
increases. Clearly its expected revenue, 

(l + q-k)x H + (k-q) x L , 

is increasing in q. Hence, it is sufficient to show merely that its expected cost, 

(1 + q-k) s H + (k-q) s L , 

a = 1 conditional on x = x^ is 

(fc-g)/3 
(k-?)/J + fc(l-/9)" 

It is readily seen that this is decreasing in q; moreover, it reduces to (3 — nothing has been 
learned — if q = 0. 
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is non-increasing in q. To do so, it is convenient to work in terms of utility 
(i.e., uh and ul) rather than directly with compensation. Let q\ and q 2 be two 
distinct values of q, with qi < q 2 . Let {w^} be the optimal contract (expressed 
in utility terms) when q = q 1 . 

Lemma 1 There exist a e (0, 1) and b G (0, 1) such that 

a(l + q 2 - k) + b(k - q 2 ) = 1 + qi-k; 

(l-a)(l + q 2 -k) + (l-b)(k-q 2 ) = k - q i; 

a (1 — k) + bk = 1 — k; and 

(l-a)(l-k) + (l-b)k = k. 



Proof: Let 

a = gl fc + g2 (l-fc) b = (l-AQfo-gQ 
<?2 qi 
Simple, albeit tedious, algebra confirms that a and 6 solve the above equations. Clearly, 
since qi > q\, a € (0, 1). To see that b is also, observe that 

(1 - k) (q 2 - qi) < <?2 - qi < <?2- 



In light of this result, define 

Uh = au H + (1 — a) u\ and 
u L = bu H + (1 - b) u\, 

where a and b are defined by Lemma 1. Let us now see that {u n } satisfies both 
the IR and IC constraints when q = q 2 . Observe first that 

(1 + q 2 - k) u H + {k- q 2 ) u L 
= [a (1 + q 2 - k) + b {k - q 2 )] u H + [(1 - a) (1 + q 2 - k) + (1 - b) (k - q 2 )\ u\ 



(1 + q 1 - k) u H + {k- q\)u\ 



and 



(1 - k)u H + ku L 
= [a(l-k) + bk] u H + [(1 - a) (1 - k) + (1 - b) k] u\ 
= (1 — k) u H + ku\. 

Since {u^} satisfies IR and IC when q — qi, it follows, by transitivity, that {u n } 
solves IR and IC when q — q 2 . 14 We need, now, simply show that 

(1 + q 2 - k) u- 1 (u H ) + (k- q 2 ) u- 1 (u L ) 
< (l + q 1 -k)u- 1 (u H ) + (k-q 1 )u- 1 (u 1 L ). 



14 Note, although it isn't necessary for our analysis, that since we know in this setting that 
the contract that solves both IR and IC is the optimal contract for inducing a = 1, we've just 



shown that {u n } is the optimal contract when q = q2- 
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To do this, observe that 

u^ 1 (u H ) < au^ 1 (u^) + (1 — a) u^ 1 (u\) and 
u-^ul) < bu- 1 (u]j) + (1 - b) u- 1 (u\) , 

by Jensen's inequality (recall u^ 1 (•) is convex). Hence we have 

(1 + q 2 - k) u~ x (u H ) + (k- q 2 ) u~ x (u L ) 
< (1 + 92 - k) [au- 1 (ujj) + (1 - a) u- 1 (ul)] 

+ (k- q 2 ) [bu- 1 «) + (1 - b) u- 1 (ui)] 
= [a (1 - k) + bk] u- 1 (u^) + [(1 - a) (1 - k) + (1 - b) k] u" 1 (u\) 
= (l- gi -fc) u - 1 «)+(fc-g 1 ) u - 1 (4) 

— the manufacturer's expected cost is no greater when q = q 2 as when q = q r 
as was to be shown. 

Summary: Although we've considered the simplest of agency models in 
this section, there are, nevertheless, some general lessons that come from this. 
First, the optimal contract for inducing an action other than the action that 
the agent finds least costly requires a contract that is fully contingent on the 
performance measure. This is a consequence of the action being unobservable 
to the principal, not the agent's risk aversion. When, however, the agent is risk 
averse, then the principal's expected cost of solving the hidden-action problem 
is greater than it would be in the benchmark full-information case: Exposing 
the agent to risk is inefficient (relative to the first best) and the cost of this 
inefficiency is borne by the principal. The size of this cost depends on how good 
an approximation the performance measure is for the variable upon which the 
principal really desires to contract, the agent's action. The better an approx- 
imation (statistic) it is, the lower is the principal's expected cost. If, as here, 
that shift also raises expected revenue, then a more accurate approximation 
means greater expected profits. It is also worth pointing out that one result, 
which might seem as though it should be general, is not: Namely, the result that 
compensation is increasing with performance (e.g., §h > §l). Although this is 
true when there are only two possible realizations of the performance measure 
(as we've proved), this result does not hold generally when there are more than 
two possible realizations. 

2.4 Multiple-outcomes model 

Now we assume that there are multiple possible outcomes, including, possibly, 
an infinite number. Without loss of generality, we may assume the set of possible 
sales levels is (0,oo), given that impossible levels in this range can be assigned 
zero probability. We will also assume, henceforth, that u(-) exhibits strict risk 
aversion (i.e., is strictly concave). 
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Recall that our problem is to solve program (3), on page 9. In this context, 
we can rewrite the problem as 



roo 

max / (x — S (x)) dFi (x) 
s(-) Jo 



subject to 



/>oo 

Jo 



u [S (x)} dFx {x)-C > U R and (7) 



u 







[S{x)]dF 1 {x)-C > / u[S(x)]dF (x), (8) 



which are the IR and IC constraints, respectively. In what follows, we assume 
that there is a well-defined density function, /„(•), associated with F a (•) for 
both a. For instance, if F a (•) is differcntiable everywhere, then f a (•) = F' a (•) 
and the J dF a (x) notation could be replaced with J f a (x) dx. Alternatively, 
the possible outcomes could be discrete, x = x\, . . . , xn, in which case 

f a (x) = F a (x)-]xmF a (x) 

and the J dF a (x) notation could be replaced with X)^=i fa (x n ). 

We solve the above program using standard Kuhn- Tucker techniques. Let 
H be the (non-negative) Lagrange multiplier on the incentive constraint and 
let A be the (non-negative) Lagrange multiplier on the individual rationality 
constraint. The Lagrangian of the problem is, thus, 

C(S(-),\,ii) = J [x-Six^dF^ + xU u(S(x))dF!(x) - C 

r>+oo P+OO 



/ r+oo r+oo \ 

+H (J u(S(x))dF!(x) - J u(S(x))dFo(x) - C\ . 



The necessary first-order conditions are A > 0, \i > 0, (7), (8), 

fo(x) 



u' [S{x)\ A + n 



0, (9) 



r+oo 

A>0^/ u{S{x))dF 1 (x) = C, and 
Jo 

r+oo r+oo 

^ > ^ / u(S(x))dF!(x) - u(S{x))dF (x) = C. 
Jo Jo 

From our previous reasoning, we already know that the IC constraint is 
binding. To see this again, observe that if it were not (i.e., fi = 0), then (9) 
would reduce to v! [S (x)] = 1/A for all x; that is, a fixed payment. 15 But we 



15 Note that we've again established the result that, absent an incentive problem, a risk- 
neutral player should absorb all the risk when trading with a risk-averse player. 
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know a fixed-payment contract is not incentive compatible. It is also immediate 
that the participation constraint must be satisfied as an equality; otherwise, 
the manufacturer could reduce the payment schedule, thereby increasing her 
profits, in a manner that preserved the incentive constraint (i.e., replace S* (x) 
with S O), where S (x) = u" 1 (u [S* (x)] - e)). 

Note that the necessary conditions above are also sufficient given the as- 
sumed concavity of u(-). At every point where it is maximized with respect to 



, (a + 

^ [ l ~ IrM]) must bc P osltlve— observe it equals 



u'(s) 



> 



— so the second derivative with respect to s must, therefore, be negative. 

The least-cost contract inducing a = 1 therefore corresponds to a payment 
schedule S(-) that varies with the level of sales in a non-trivial way given by 
(9). That expression might look complicated, but its interpretation is central 
to the model and easy to follow. Observe, in particular, that because it' (•) 
is a decreasing function (u(-), recall, is strictly concave), S (x) is positively 
correlated with 

fo(x)' 



1 - 



that is, the larger (smaller) is this term, the larger (smaller) is S (x). 

The reward for a given level of sales x depends upon the likelihood ratio 

r{x) = m 

of the probability that sales are x when action a = is taken relative to that 
probability when action a = 1 is taken. 16 This ratio has a clear statistical 
meaning: It measures how more likely it is that the distribution from which sales 
have been determined is -fo(') rather than -Fi(-). When r(x) is high, observing 
sales equal to x allows the manufacturer to draw a statistical inference that it 
is much more likely that the distribution of sales was actually Fq(.); that is, the 
salesperson did not expend effort promoting the product. In this case, 



fo(x) 
/i(z) 



is small (but necessarily positive) and S(x) must also be small as well. When 
r(x) is small, the manufacturer should feel rather confident that the salesperson 
expended effort and it should, then, optimally reward him highly. That is, 
sales levels that are relatively more likely when the agent has behaved in the 
desired manner result in larger payments to the agent than sales levels that are 
relatively rare when the agent has behaved in the desired manner. 



16 Technically, if F a (•) is diffcrcntiablc (i.e., f a (•) is a probability density function), then 
the likelihood ratio can be interpreted as the probability that sales lie in (x,x + dx) when 
a = divided by the probability of sales lying in that interval when a = 1. 
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The minimum-cost incentive contract that induces the costly action a = 
1 in essence commits the principal (manufacturer) to behave like a Bayesian 
statistician who holds some diffuse prior over which action the agent has taken: 17 
She should use the observation of sales to revise her beliefs about what action 
the agent took and she should reward the agent more for outcomes that cause 
her to revise upward her beliefs that he took the desired action and she should 
reward him less (punish him) for outcomes that cause a downward revision in 
her beliefs. 18 As a consequence, the payment schedule is connected to sales only 
through their statistical content (the relative differences in the densities), not 
through their accounting properties. In particular, there is now no reason to 
believe that higher sales (larger x) should be rewarded more than lower sales. 
As an example, suppose that there are three possible sales levels: low, medium, 
and high (xl, Xm, and Xh, respectively). Suppose, in addition, that 



fa(x) 



if x 



3? 11 <f Xl 

^~gT~ > if x = X M 
?±«, if x = x H 



Then 



A, if x = xl 

A — /i, if x = xm 



A+ if x 



XH 



Hence, low sales are rewarded more than medium sales — low sales are unin- 
formative about the salesperson's action, whereas medium sales suggest that 
the salesperson has not invested. Admittedly, non-monotonic compensation is 
rarely, if ever, observed in real life. We will see below what additional properties 
are required, in this model, to ensure monotonic compensation. 

Note, somewhat implicit in our analysis to this point, is an assumption that 
fi (x) > except, possibly, on a subset of x that are impossible (have zero 
measure). Without this assumption, (9) would entail division by zero, which is, 
of course, not permitted. If, however, we let /i (•) go to zero on some subset of 
x that had positive measure under F (•), then we see that [i must also tend to 
zero since 



A + /i 



1 



must be positive. In essence, then, the shadow price (cost) of the incentive 
constraint is vanishing as /i (•) goes to zero. This makes perfect sense: Were 
fi (•) zero on some subset of x that could occur (had positive measure) under 
Fq (•), then the occurrence of any x in this subset, Xq, would be proof that the 



17 A diffuse prior is one that assigns positive probability to each possible action. 

18 Of course, as a rational player of the game, the principal can infer that, if the contract 
is incentive compatible, the agent will have taken the desired action. Thus, there is not, in 
some sense, a real inference problem. Rather the issue is that, to be incentive compatible, the 
principal must commit to act as if there were an inference problem. 
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agent had failed to take the desired action. We can use this, then, to design 
a contract that induces a = 1, but which costs the principal no more than 
the optimal full-information fixed-payment contract S (x) — sf. That is, the 
incentive problem ceases to be costly; so, not surprisingly, its shadow cost is 
zero. 

To see how we can construct such a contract when fx (x) = for all x £ X n , 

let 

G , \ f s + e, if x e Xq 

where e > is arbitrarily small (s, recall, is the greatest lower bound of the 
domain of «(•))• Then 

r+oo 

/ u(S(x))dF!(x) = u(«f) and 
Jo 

/ u(S(x))dF (x) = / u(s + e)cLF (x) + / u(s()dF (x) 
J a Jx Jr+\x 

= u(s + e)F (X ) + u(s[)(l-F (X )). 

From the last expression, it's clear that / + °° u(S(x))dF (x) — » — oo as £ — > 0; 
hence, the IC constraint is met trivially. By the definition of sf , IR is also 
met. That is, this contract implements a = 1 at full-information cost. Again, as 
we saw in the two-outcome model, having a shifting support (i.e., the property shifting support 
that F (X ) > — Fi (Xq)) allows us to implement the desired action at full- 
information cost. 

Let us conclude this sub-section by explaining how to answer the final ques- 
tion about the optimal contract. The manufacturer faces the following alter- 
native: Either it imposes the fixed-payment contract Sq , which induces action 
a = 0, or it imposes the contract S(-) that has been derived above, which in- 
duces action a = 1. The expected-profit-maximizing choice results from the 
simple comparison of these two contracts; that is, the manufacturer imposes the 
incentive contract S(-) if and only if: 

E [x] - sf < Ei [x] - Ei [S(x)] (10) 

The right-hand side of this inequality corresponds to the value of the maximiza- 
tion program (3) . Given that the incentive constraint is binding in this program, 
this value is strictly smaller than the value of the same program without the 
incentive constraint; hence, just as we saw in the two-outcome case, the value 
is smaller than full-information profits, E\ [x] — sf . Observe, therefore, that it 
is possible that 

Ei [x] - sf > E [x] -si > Ei [x] - Ei [S{x)] : 

Under full information the principal would induce a = 1, but not if there's a 
hidden-action problem. In other words, imperfect observability of the agent's 
action imposes a cost on the principal that may induce her to distort the action 
that she induces the agent to take. 
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2.5 Monotonicity of the optimal contract 

Let us suppose that (10) is, indeed, satisfied so that the contract S(-) derived 
above is the optimal contract. Can we exhibit additional and meaningful as- 
sumptions that would imply interesting properties of the optimal contract? 

We begin with monotonicity, the idea that greater sales should mean greater 
compensation for the salesperson. As we saw above (page 20), there is no guar- 
antee in the multiple-outcome model that this property should hold everywhere. 
From (9), it docs hold if and only if the likelihood ratio, r (x), is not decreasing 
and increasing at least somewhere. As this is an important property, it has a 
name: 

Definition 1 (MLRP) The likelihood ratio r (x) = fo (x) / f\ (x) satisfies the 
monotone likelihood ratio property (MLRP) if r (•) is non-increasing almost 
everywhere and strictly increasing on at least some set of x 's that occur with 
positive probability given action a = 1. 

The MLRP states that the greater is the outcome (i.e., x), the greater the 
relative probability of x given a = 1 than given a = 0. 19 In other words, under 
MLRP, better outcomes are more likely when the salesperson expends effort 
than when he doesn't. To summarize: 

Proposition 1 In the model of this section, if the likelihood ratio, r(-), satis- 
fies the monotone likelihood ratio property, then the salesperson's compensation, 
S(-), under the optimal incentive contract for inducing him to expend effort 
(i.e., to choose a = 1 ) is non- decreasing everywhere. 

In fact, because we know that S (•) can't be constant — a fixed-payment contract 
is not incentive compatible for inducing a — 1 — we can conclude that S (•) must, 
therefore, be increasing over some set of x. 



19 Technically, if f a (x) = F' a (x), then we should say "the greater the relative probability of 
an x g (x, x + dx) given a = 1 than given a = 0." 
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Technical Aside 

Before analyzing the MLRP further, it's worth noting that even if MLRP does not 
hold, r (•) must be decreasing over some measurable range. To see this, suppose it 
were not true; that is, suppose that r(-) is almost everywhere non-decreasing under 
distribution Fi(-). Note this entails that x and r(x) are non-negatively correlated 
under Fi(-). To make the exposition easier, suppose for the purpose of this aside that 
/«(•)=#(•)■ Then 



Ei 



./i(*)J J< 

Because x and r(x) are non-negatively correlated, we thus have 



^\) h(x)dx= r Mx)dx = i 

AW/ Jo 



f 

Jo 



x[r(x) -Ei{r(x)}]/i(x)dx > 0. 

o 

Substituting, this implies 

r oo f ( ~,\ poo p oo 

0</ - l]/i(x)da; = / xf (x)dx - xfi(x)dx = E [x] - E^x]. 

Jo h( x ) Jo Jo 

But this contradicts our assumption that investing (a = 1) yields greater expected 
revenues than does not investing (a = 0). Hence, by contradiction it must be that r(-) 
is decreasing over some measurable range. But then this means that S(-) is increasing 
over some measurable range. However, without MLRP, we can't conclude that it's not 
also decreasing over some other measurable range. 

Conclusion: // Eo[x] < Ei[a;], then S(-) is increasing over some set of x that has 
positive probability of occurring given action a — 1 even if MLRP does not hold. 



Is MLRP a reasonable assumption? To some extent is simply a strengthening 
of our assumption that Ei [x] > E n [x], given it can readily be shown that MLRP 
implies Ei [x] > Eo [x]. 20 Moreover, many standard distributions satisfy MLRP. 
But it quite easy to exhibit meaningful distributions that do not. For instance, 
consider our example above (page 20). We could model these distributions as 
the consequence of a two-stage stochastic phenomenon: With probability 1/3, 
a second new product is successfully introduced that eliminates the demand for 
the manufacturer's product (i.e., x L = 0). With probability 2/3, this second 
product is not successfully introduced and it is "business as usual," with sales 
being more likely to be i« if the salesperson doesn't expend effort and more 
likely to be xh if he docs. These "compounded" distributions do not satisfy 
MLRP. In such a situation, MLRP is not acceptable and the optimal reward 
schedule is not monotonic as we saw. 

Although there has been a lot of discussion in the literature on the mono- 
tonicity issue, it may be overemphasized. If we return to the economic reality 
that the mathematics seeks to capture, the discussion relies on the assumption 
that a payment schedule with the feature that the salesperson is penalized for 



20 This can most readily be seen from the Technical Aside: Simply assume that r (•) satisfies 
MLRP, which implies x and r (x) are negatively correlated. Then, following the remaining 
steps, it quickly falls out that Ei [x] > Eq [x]. 
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increasing sales in some range does not actual induce a new agency problem. 
For instance, if the good is perishable or costly to ship, it might be possible for 
the salesperson to pretend, when sales are xm, that they are xl (if the allegedly 
unsold amount of the good cannot be returned). That is, a non-monotonic 
incentive scheme could introduce a new agency problem of inducing the sales- 
person to report his sales honestly. Of course, if the manufacturer can verify 
the salesperson's inventory, a non-monotonic scheme might be possible. But 
think of another situation where the problem is to provide a worker (the agent) 
incentives to produces units of output; isn't it natural, then, to think that the 
worker could very well stop his production at xl or destroy his extra produc- 
tion xm — xl? Think of yet another situation where the problem is to provide 
incentives to a manager; aren't there many ways to spend money in a hardly 
detectable way so as to make profits look smaller than what they actually are? 
In short, the point is that if the agent can freely and secretly diminish his per- 
formance, then it makes no sense for the principal to have a reward schedule 
that is decreasing with performance over some range. In other words, there is 
often an economic justification for monotonicity even when MLRP doesn't hold. 

2.6 Informativeness of the performance measures 

Now, we again explore the question of the informativeness of the performance 
measures used by the principal. To understand the issue, suppose that the man- 
ufacturer in our example can also observe the sales of another of its products 
sold by the salesperson. Let y denote these sales of the other good. These sales 
are, in part, also random, affected by forces outside the parties' control; but also, 
possibly, determined by how much effort the salesperson expends promoting the 
first product. For example, consumers could consider the two goods comple- 
ments (or substitutes). Sales y will then co-vary positively (or negatively) with 
sales x. Alternatively, both goods could be normal goods, so that sales of y 
could then convey information about general market conditions (the incomes 
of the customers). Of course, it could also be that the demand for the second 
product is wholly unrelated to the demand for the first; in which case sales y 
would be insensitive to the salesperson's action. 

Let fo(x, y) and fi(x, y) denote the joint probability densities of sales x and y 
for action a = and a = 1 . An incentive contract can now be a function of both 
performance variables; i.e., s = S(x,y). It is immediate that the same approach 
as before carries through and yields the following optimality condition: 21 



When is it optimal to make compensation a function of y as well as of x? The 



21 Although we use the same letters for the Lagrange multipliers, it should be clear that 
their values at the optimum are not related to their values in the previous, one-pcrformancc- 
measure, contracting problem. 




(11) 



answer is straightforward: When the likelihood ratio, r(x, y) 



_ fo{x,y) 



actually 
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depends upon y. Conversely, when the likelihood ratio is independent of y, then 
there is no gain from contracting on y to induce a = 1; indeed, it would be 
sub-optimal in this case since such a compensation scheme would fail to satisfy 
(11). The likelihood ratio is independent of y if and only if the following holds: 
There exist three functions h(-,-), go(-) and gi(-) such that, for all (x,y), 

f a (x,y) = h(x,y)g a (x). (12) 

Necessity is obvious (divide /o (x, y) by f\ (x, y) and observe the ratio, go (x) / g\ (x), 
is independent of y). Sufficiency is also straightforward: Set h(x,y) = f\(x,y), 
gi(x) = 1, and go{x) = r(x). This condition of multiplicative separability, (12), 
has a well-established meaning in statistics: If (12) holds, then x is a sufficient 
statistic for the action a given data (x,y). In words, were we trying to infer a, 
our inference would be just as good if we observed only x as it would be if we 
observed the pair (x, y). That is, conditional on knowing x, y is iminformative 
about a. 

This conclusion is, therefore, quite intuitive, once we recall that the value 
of performance measures to our contracting property rests solely on their sta- 
tistical properties. The optimal contract should be based on all performance 
measures that convey information about the agent's decision; but it is not de- 
sirable to include performance measures that are statistically redundant with 
other measures. As a corollary, there is no gain from considering ex post random 
contracts (e.g., a contract that based rewards on x + t], where rj is some random 
variable — noise — distributed independently of a that is added to x). 22 As a 
second corollary, if the principal could freely eliminate noise in the performance 
measure — i.e., switch from observing x + n to observing x — she would do better 
(at least weakly). 

2.7 Conclusions from the two-action model 

It may be worth summing up all the conclusions we have reached within the 
two-action model in a proposition: 

Proposition 2 If the agent is strictly risk averse, there is no shifting support, 
and the principal seeks to implement the costly action (i.e., a — I), then the 
principal's expected profits are smaller than under full (perfect) information. 
In some instances, this reduction in expected profits may lead the principal to 
implement the "free" action (i.e., a = 0). 

• When (10) holds, the reward schedule imposes some risk on the risk-averse 
agent: Performances that are more likely when the agent takes the correct 
action a = 1 are rewarded more than performances that are more likely 
under a = 0. 

• Under MLRP (or when the agent can destroy output), the optimal reward 
schedule is non- decreasing in performance. 

22 Ex ante random contracts may, however, be of some value, as explained later. 
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• The optimal reward schedule depends only upon performance measures that 
are sufficient statistics for the agent's action. 

To conclude, let us simply stress the two major themes that we would like 
the reader to remember from this section. First, imperfect information implies 
that the contractual reward designed by the principal should perform two tasks: 
Share the risks involved in the relationship and provide incentives to induce the 
agent to undertake the desired action. Except in trivial cases (e.g., a risk-neutral 
agent or a shifting support), these two goals are in conflict. Consequently, 
the optimal contract may induce an inefficient action and a Pareto suboptimal 
sharing of risk. Second, the optimal reward schedule establishes a link between 
rewards and performances that depends upon the statistical properties of the 
performance measures with respect to the agent's action. 



2.8 Bibliographic notes 



The analysis presented so far is fairly standard. The two-step approach — first 
determine, separately, the optimal contracts for implementing a = and a = 1, 
then choose which yields greater profits — is due to Grossman and Hart (1983). 
The analysis, in the two-outcome case, when q varies is also based on their 
work. They also consider the monotonicity of the optimal contract, although 
our analysis here draws more from Holmstrom (1979). Holmstrom is also our 
source for the sufficient-statistic result. Finally, the expression 



u> \S{x)\ 



= A + /i 



1 - 



/o(s) 



which played such an important part in our analysis, is frequently referred to 
as the modified Borch sharing rule, in honor of Borch (1968), who worked out 
the rules for optimal risk sharing absent a moral-hazard problem (hence, the 
adjective "modified" ) . 



3 General formal setting 

As we've just seen, the two-action model yields strong results. But the model 
incorporates a lot of structure and it relies on strong assumptions. Consequently, 
it's hard to understand which findings are robust and which are merely artifacts 
of an overly simple formalization. The basic ideas behind the incentive model 
are, we think, quite deep and we ought to show whether and how they generalize 
in less constrained situations. 

Our approach is to propose a very general framework that captures the 
situation described in the opening section. Such generality comes at the cost of 
tractability, so we will again find ourselves making specific assumptions. But 
doing so, we will try to motivate the assumptions we have to make and discuss 
their relevance or underline how strong they are. 

The situation of incentive contracting under hidden action or imperfect mon- 
itoring involves: 
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• a principal; 

• an agent; 

• a set of possible actions, A, from which the agent chooses (we take A to 
be exogenously determined here); 

• a set of verifiable signals or performance measures, X; 

• a set of benefits, B, for the principal that are affected by the agent's action 
(possibly stochastically) ; 

• rules (functions, distributions, or some combination) that relate elements 
of A, X, and B; 

• preferences for the principal and agent; and 

• a bargaining game that establishes the contract between principal and 
agent (here, recall, we've fixed the bargaining game as the principal makes 
a take-it-or-leave-it offer, so that the only element of the bargaining game 
of interest here is the agent's reservation utility, Ur). 23 

In many settings, including the one explored above, the principal's benefit is 
the same as the verifiable performance measure (i.e., b — x). But this need not 
be the case. We could, for instance, imagine that there is a function mapping 
the elements of A onto B. For example, the agent's action could be fixing the 
"true" quality of a product produced for the principal. This quality is also, 
then, the principal's benefit (i.e., b = a). The only verifiable measure of quality, 
however, is some noisy (i.e., stochastic) measure of true quality (e.g., x = a + n, 
where r\ is some randomly determined distortion). As yet another possibility, 
the benchmark case of full information entails X = X' x A, where X' is some 
set of performance measures other than the action. 

We need to impose some structure on X and B and their relationship to A: 
We take A" to be a Euclidean vector space and we let dF (-\a) denote the prob- 
ability measure over X conditional on a. Similarly, we take B to be a Euclidean 
vector space and we let dG(- 7 - \a) denote the joint probability measure over B 
and X conditional on a (when b = x, we will write dF (-|a) instead of dG (•, -\a)). 
This structure is rich enough to encompass the possibilities enumerated in the 
previous paragraph (and more). 

Although we could capture the preferences of the principal and agent without 
assuming the validity of the expected-utility approach to decision-making under 
uncertainty (we could, for instance, take as primitives the indifference curves 

23 We could also worry about whether the principal wants to participate — even make a take- 
it-or-leave-it offer — but because our focus is on the contract design and its execution, stages 
of the game not reached if she doesn't wish to participate, we will not explicitly consider this 
issue here. 



27 



General formal setting 



shown in Figures 1 and 2), this approach has not been taken in the litera- 
ture. 24 Instead, the expected-utility approach is assumed to be valid and we 
let W(s,x,b) and U(s,x,a) denote the respective von Neumann-Morgenstern 
utility of the principal and of the agent, where s denotes the transfer from the 
principal to the agent (to principal from agent if s < 0). 

In this situation, the obvious contract is a function that maps X into R. We 
define such a contract as 

Definition 2 A simple incentive contract is a reward schedule S : X — > R that 
determines the level of reward s = S(x) to be decided as a function of the realized 
performance level x. 

There is admittedly no other verifiable variable that can be used to write 
more elaborate contracts. There is, however, the possibility of creating verifiable 
variables, by having one or the other or both players take verifiable actions from 
some specified action spaces. Consistent with the mechanism-design approach, 
the most natural interpretation of these new variables are that they are public 
announcements made by the players; but nothing that follows requires this 
interpretation. For example, suppose both parties have to report to the third 
party charged with enforcing the contract their observation of x, or the agent 
must report which action he has chosen. We could even let the principal make 
a "good faith" report of what action she believes the agent took, although 
this creates its own moral-hazard problem because, in most circumstances, the 
principal could gain ex post by claiming she believes the agent's action was 
unacceptable. It turns out, as we will show momentarily, that there is nothing 
to be gained by considering such elaborate contracts; that is, there is no such 
contract that can improve over the optimal simple contract. 

To see this, let us suppose that a contract determines a normal- form game to 
be played by both players after the agent has taken his action. 25 ' 26 In particular, 
suppose the agent takes an action h £ TL after choosing his action, but prior to 
the realization of x; that he takes an action m £ M. after the realization of x; and 
that the principal also takes an action n £ Af after x has been realized. One or 
more of these sets could, but need not, contain a single clement, a "null" action. 
We assume that the actions in these sets are costless — if we show that costless 

24 Given some well-documented deficiencies in expected-utility theory (see, e.g., Epstein, 
1992; Rabin, 1997), this might, at first, seem somewhat surprising. However, as Epstein, §2.5, 
notes many of the predictions of expected-utility theory are robust to relaxing some of the 
more stringent assumptions that support it (e.g., such as the independence axiom). Given 
the tractability of the expected-utility theory combined with the general empirical support for 
the predictions of agency theory, we see the gain from sticking with expected-utility theory 
as outweighing the losses, if any, associated with that theory. 

25 Note this may require that there be some way that the parties can verify that the agent 
has taken an action. This may simply be the passage of time: The agent must take his action 
before a certain date. Alternatively, there could be a verifiable signal that the agent has acted 
(but which does not reveal how he's acted). 

26 Considering a extensive-form game with the various steps just considered would not alter 
the reasoning that follows; so, we avoid these unnecessary details by restricting attention to 
a normal-form game. 
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elaboration does no better than simple contracts, then costly elaboration also 
cannot do better than simple contracts. Finally, let the agent's compensation 
under this elaborate contract be: s = S(x,h,m,n). We can now establish the 
following: 

Proposition 3 (Simple contracts are sufficient) For any general contract 
H,A4,Af, and associated (perfect Bayesian) equilibrium, there exists a 

simple contract S(-) that yields the same equilibrium outcome. 



Proof: Consider a (perfect Bayesian) equilibrium of the original contract, involving 
strategies (a*,h* (•) , m*(-, •)) for the agent, where h* (a) and m* (a,x) describe the 
agent's choice within H and M after he's taken action a and performance x has been 
observed. Similarly, n* (x) gives the principal's choice of action as a function of the 
observed performance. Let us now consider the simple contract defined as follows: For 
all x G X, 

S(x) = S(x,h* (a*) ,m*(x,a*),n*(x)). 

Suppose that, facing this contract, the agent chooses an action a different from a*. 
Then, this implies that: 

f U(S{x),x,a)dF(x\a)> [ U{S(x),x,a*)dF(x\a*), 

J X J X 

or, using the definition of £(•), 

U ^S(x,h* (a*) ,m*(x,a*),n*(x)),x,a^ dF(x\a) 
> J U ^S(x,h* (a*) ,m*(x,a*),n*(x)),x,a*~^dF(x\a*). 



L 



Since, in the equilibrium of the normal-form game that commences after the agent 
chooses his action, h* (•) and m*(-, •) must satisfy the following inequality: 



U ^S(x, h (a) , m*(x, a), n* (x)), x, oj dF(x\a) 
> J U ^S(x, h* (a*) , m*(x, a*), n*(x)), x, oj dF(x\a), 

it follows that 

j U ^S(x, h* (a) , m*(x, a),n*{x)), x, oj dF(x\a) 

> j U ^S(x,h* (a*) ,m*(x,a*),n*{x)),x,a*^ dF(x\a*). 

This contradicts the fact the a* is an equilibrium action in the game defined by the 
original contract. Hence, the simple contract S(-) gives rise to the same action choice, 
and therefore the same distribution of outcomes than the more complicated contract. ■ 
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As a consequence, there is no need to consider sophisticated announcement 
mechanisms in this setting, at least in the simple situation we have described. 27 
It is worth noting that this style of proof is used many times in the study of 
contract theory. In particular, it is used to prove the Revelation Principle. 

The contracting problem under imperfect information can now easily be 
stated. The principal, having the bargaining power in the negotiation process, 
simply has to choose a (simple) contract, S(-), so as to maximize her expected 
utility from the relationship given two constraints. First, the contract S(-) 
induces the agent to choose an action that maximizes his expected utility (i.e., 
the IC constraint must be met). Second, given the contract and the action it 
will induce, the agent must receive an expected utility at least as great as his 
reservation utility (i.e., the IR constraint must be met). In this general setting, 
the IC constraint can be stated as the action induced, a, must satisfy 

aeargmax/ U(S(x), x, a')dF(x\a'). (13) 
a ' Jx 

Observe that choosing S(-) amounts to choosing a as well, at least when there 
exists a unique optimal choice for the agent. To take care of the possibility 
of multiple optima, one can simply imagine that the principal chooses a pair 
(S(-),a) subject to the incentive constraint (13). The IR constraint takes the 
simple form: 

max / U{S{x),x,a')dF(x\a')>U R . (14) 
a ' Jx 

The principal's problem is, thus, 

max / W{S{x),x,b)dG{b,x\a) (15) 
s.t. (13) and (14). 

Observe, as we did in Section 2, it is perfectly permissible to solve this 
maximization program in two steps. First, for each action a, find the expected- 
profit-maximizing contract that implements action a subject to the IC and IR 
constraints; this amounts to solving a similar program, taking action a as fixed: 

max / W(S(x),x,b)dG(b,x\a) (16) 
s (-) Jx 

s.t. (13) and (14). 

Second, optimize the principal's objectives with respect to the action to imple- 
ment; if we let S a (-) denote the expected-profit-maximizing contract for imple- 
menting a, this second step consists of: 

max / W(S a (x),x,b)dG(b,x\a). 

aeA J x 

27 We will see that more sophisticated contracts may be strictly valuable when, for example, 
the agent gets an early private signal about performances. Intuitively though, sophisticated 
contracts are only useful when one player gets some private information about the realization 
of the state of nature along the relationship. 
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In this more general framework, it's worth revisiting the full-information 
benchmark. Before doing that, however, it is worth assuming that the domain 
of U (•, x, a) is sufficiently broad: 

• Existence of a punishment: There exists some sp in the domain of 
U (•, x, a) for all x € X and a e A such that 



for all a 1 e A. 

• Existence of a sufficient reward: There exists some sr in the do- 
main of U (•, x, a) for all x £ X and a £ A such that 



for all a' e A. 

In light of the second assumption, we can always satisfy (14) for any action a 
(there is no guarantee, however, that we can also satisfy (13)). 

With these two assumptions in hand, suppose that we're in the full-information 
case; that is, X = X' x A (note X' could be a single-clement space, so that we're 
also allowing for the possibility that, effectively, the only performance measure 
is the action itself). In the full-information case, the principal can rely on forc- 
ing contracts; that is, contracts that effectively leave the agent with no choice 
over the action he chooses. Hence, writing (x' , a) for an clement of X, a forcing 
contract for implementing a is 



where S F (-) satisfies (14). Since S F (x 1 ) = sr satisfies (14) by assumption, we 
know that we can find a S F (•) that satisfies (14). In equilibrium, the agent will 
choose to sign the contract — the IR constraint is met — and he will take action a 
since this is his only possibility for getting at least his reservation utility. Forcing 
contracts are very powerful because they transform the contracting problem into 
a simple ex ante Pareto computation program: 



where only the agent's participation constraint matters. This ex ante Pareto 
program determines the efficient risk-sharing arrangement for the full-infor- 
mation optimal action, as well as the full-information optimal action itself. Its 
solution characterizes the optimal contract under perfect information. 

At this point, we've gone about as far as we can go without imposing more 
structure on the problem. The next few sections consider more structured vari- 
ations of the problem. 





S(x', a) = sp if a 

= S F (x') if a = a, 



(S(-)-a) J x 

s.t. (14), 




(18) 



31 



The Finite Model 



4 The Finite Model 

In this section, we suppose that A, the set of possible actions, is finite with J 
elements. Likewise, the set of possible verifiable performance measures, X, is 
also taken to be finite with TV elements, indexed by n (although, at the end of 
this section, we'll discuss the case where X = R). Given the world is, in reality, 
finite, this is the most general version of the principal-agent model (although 
not necessarily the most analytically tractable). 28 

We will assume that the agent's utility is additively separable between pay- 
ments and action. Moreover, it is not, directly, dependent on performance. 
Hence, 

U (s, x,a) = u (s) — c (a) ; 

where u : S — > R maps some subset S of R into R and c : A — > R maps the 
action space into R. As before, we assume that S = (s, oo), where u (s) — > — oo 
as s I s. Observe that this assumption entails the existence of a punishment, 
sp, as described in the previous section. We further assume that u (•) is strictly 
monotonic and concave (at least weakly). Typically, we will assume that u(-) 
is, in fact, strictly concave, implying the agent is risk averse. Note, that the 
monotonicity of u(-) implies that the inverse function u^ 1 (•) exists and, since 
u (S) = R, is defined for all u e R. 

We assume, now, that 6cl and that the principal's utility is a function 
only of the difference between her benefit, b, and her payment to the agent; that 
is, 

W (s, x, b) — w{b — s) , 

where w (•) is assumed to be strictly increasing and concave. In fact, in most 
applications — particularly most applications of interest in the study of strategy 
and organization — it is reasonable to assume that the principal is risk neutral. 
We will maintain that assumption here (the reader interested in the case of a 
risk-averse principal should consult Holmstrom, 1979, among other work). In 
what follows, let B (a) = E{b\a}. 

In addition to being discrete, we assume that there exists some partial order 
on X (i.e., to give meaning to the idea of "better" or "worse" performance) and 
that, with respect to this partial order, X is a chain (i.e., if ^ is the partial order 
on X, then x < x' or x' < x for any two elements, x and x' in X). Because 
identical signals are irrelevant, we may also suppose that no two elements of 
X are the same (i.e., x -< x' or x' ~< x for any two elements in X). The most 
natural interpretation is that A" is a subset of distinct real numbers — different 
"performance scores" — with < as the partial order. Given these assumptions, 
we can write X = {xi, . . . , Xn}, where x m -< x n if m < n. Likewise the 

28 Limitations on measurement — of both inputs and outputs — make the world discrete. For 
instance, effort, measured as time on the job, can take only discrete values because only 
discrete intervals of time can be measured. Similarly, output, measured as volume, can take 
only discrete values because only discrete intervals of volume can be measured. Because the 
world is also bounded — each of us is allotted only so much time and there are only so many 
atoms in the universe (under current cosmological theories) — it follows that the world is finite. 
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distribution function F (x\a) gives, for each x, the probability that an x' -< x 
is realized conditional on action a. The corresponding density function is then 
defined by 

f(r \ a \ _ / F{xi\a) if n=l 

J {Xn\a> - | p _ p {Xn _ ila) i{n>1 ■ 

In much of what follows, it will be convenient to write /„ (a) for f (x n \a). It 
will also be convenient to write the density as a vector: 

{(a) = (f 1 (a),...J N (a)) T 

(observe, unless indicated otherwise, vectors are column vectors; hence the final 
T, which denotes matrix Transpose, is needed to transform the row vector into 
a column vector). 

4.1 The "two-step" approach 

As we have done already in Section 2, we will pursue a two-step approach to 
solving the principal-agent problem: 

Step 1: For each a € A, the principal determines whether it can be imple- 
mented. Let A 1 denote the set of implcmcntable actions. For each a G A 1 , 
the principal determines the least-cost contract for implementing a subject 
to the IC and IR constraints. Let C (a) denote the principal's expected 
cost (expected payment) of implementing a under this least-cost contract. 

Step 2: The principal then determines the solution to the maximization prob- 
lem 

max B (a) — C (a) . 

aeA 1 

If a* is the solution to this maximization problem, the principal offers the 
least-cost contract for implementing a*. 

Note that this two-step process is analogous to a standard production prob- 
lem, in which a firm, first, solves its cost-minimization problems to determine 
the least-cost way of producing any given amount of output (i.e., derives its 
cost function); and, then, it produces the amount of output that maximizes the 
difference between revenues (benefits) and cost. As with production problems, 
the first step is generally the harder step. 

4.2 The full-information benchmark 

As before, we consider as a benchmark the case where the principal can observe 
and verify the agent's action. Consequently, as we discussed at the end of Section 
3, the principal can implement any action a that she wants using a forcing 
contract: The contract punishes the agent sufficiently for choosing actions a ^ a 
that he would never choose any action other than a; and the contract rewards 
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the agent sufficiently for choosing a that he is just willing to sign the principal's 
contract. This last condition can be stated formally as 

u(s)-c(a) = U R , (IR F ) 

where s is what the agent is paid if he chooses action a. Solving this last 
expression for s yields 

s = u- 1 [U R + c(a)] = C F (a). 

The function C F (•) gives the cost, under full information, of implementing 
actions. 

4.3 The hidden-action problem 

Now, and henceforth, we assume that a hidden-action problem exists. Conse- 
quently, the only feasible contracts are those that make the agent's compensation 
contingent on the verifiable performance measure. Let s (x) denote the payment 
made to the agent under such a contract if x is realized. It will prove convenient 
to write s n for s (x n ) and to consider the compensation vector s = (si, . . . , sjv) T - 
The optimal — expected-cost-minimizing — contract for implementing a (assum- 
ing it can be implemented) is the contract that solves the following program: 29 

min f (a) T s 

s 

subject to 

N 

L (a) u (s n ) - c (a) > U R 

n=l 

(the IR constraint) and 

N 

a e max /„ (a) u (s n ) - c (a) 

a 

71=1 

(the IC constraint — see (13)). Observe that an equivalent statement of the IC 
constraint is 

N N 

X] /" (") u ( s «) ~ c (a)>^2fn (a) u (s„) - c (a) Va G A. 

n—l n—1 

As we've seen above, it is often easier to work in terms of utility payments 
than in terms of monetary payments. Specifically, because u (•) is invertible, 
we can express a contract as an iV-dimcnsional vector of contingent utilities, 

29 Observe, given the separability between the principal's benefit and cost, minimizing her 
expected wage payment is equivalent to maximizing her expected profit. 
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u = (ui, ■ • ■ , un) T , where u n — u(s n ). Using this "trick," the principal's 
program becomes 

N 

min Yl /« ( a ) u 1 ( 19 ) 

u 

n=l 

subject to 

f (a) T u-c(a) > U R (IR) 

and 

f (a) T u-c(a) > f (a) T u-c(a) Va e A (IC) 

Definition 3 action a is implementable if i/iere exists at least one contract 
solving (IR) and (IC). 

A key result is the following: 

Proposition 4 If a is implementable, then there exists a contract that imple- 
ments a and satisfies (IR) as an equality. Moreover, (IR) is met as an equality 
(i.e., is binding) under the optimal contract for implementing a. 

Proof: Suppose not: Let u be a contract that implements a and suppose that 

f (a) T u — c (a) > Ur. 

Define 

e = f (a) T u — c (a) — Ur. 

By assumption, e > 0. Consider a new contract, u, where u n = u n — e. By construc- 
tion, this new contract satisfies (IR). Moreover, because 

f (a) T u = f (a) T u e, 

for all a € A, this new contract also satisfies (IC). Observe, too, that this new contract 
is superior to u: It satisfies the contracts, while costing the principal less. Hence, a 
contract cannot be optimal unless (IR) is an equality under it. ■ 

In light of this proposition, it follows that an action o can be implemented 
if there is a contract u that solves the following system: 

t(a) T u-c(a) = U R (20) 

and 

f (a) T u-c(a) < U R Va e A\ {a} (21) 

(where (21) follows from (IC) and (20)). We are now in position to establish 
the following proposition: 
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Proposition 5 Action a is implementable if and only if there is no strategy 
for the agent that induces the same density over signals as a and which costs 
the agent less, in terms of expected disutility, than a (where "strategy" refers to 
mixed, as well as, pure strategies). 



Proof: Let j = 1, . . . , J — 1 index the elements in A other than a. Then the system 
(20) and (21) can be written as J + 1 inequalities: 

f (a) T u< U R + c(a) 
[-f(a)] T u<-U R ~c{a) 
f (ai) T u< U R + c(ai) 

f (a./-i) T u < Ur + c (aj_i) 

By a well-known result in convex analysis (see, e.g., Rockafellar, 1970, page 198), there 
is a u that solves this system if and only if there is no vector 

A 4 = (A+, A-,Mi, • • • ,MJ-i) T > Oj+i 
(where Ok is a A"-dimensional vector of zeros) such that 

j-i 

A+f (a) + A- [-f (&)] + J2 W f (aj) = On (22) 

2 = 1 

and 

j-i 

A+ [Ur + c (a)] + A- [-Ur - c (a)] + ^ W [U R + c {a,)] < 0. (23) 

3 = 1 

Observe that if such a fi exists, then (23) entails that not all elements can be zero. 
Define fi, = A+ — A-- By post-multiplying (22) by ljv (an iV-dimensional vector of 
ones), we see that 

j-i 

+ X) W = °- ( 24 ) 

2 = 1 

Equation (24) implies that pi* < 0. Define aj — Uj/ (— M*)- By construction each 
°~j > (with at least some being strictly greater than 0) and, from (24), 5Z/=i °"j = 1- 
Hence, we can interpret these aj as probabilities and, thus, as a mixed strategy over 
the elements of A\{a}. Finally, observe that if we divide both sides of (22) and (23) 
by — fi* and rearrange, we can see that (22) and (23) are equivalent to 

j-i 

f(a) = X^ f («2') (25) 

3 = 1 

and 

j-i 

c ( a ) > X a 3 c ( a j) ; ( 26 ) 

2 = 1 

that is, there is a contract u that solves the above system of inequalities if and only 
if there is no (mixed) strategy that induces the same density over the performance 
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measures as a (i.e., (25)) and that has lower expected cost (i.e., (26)). ■ 

The truth of the necessity condition (only if part) of Proposition 5 is straight- 
forward: Were there such a strategy — one that always produced the same ex- 
pected utility over money as a, but which cost the agent less than a — then it 
would clearly be impossible to implement a as a pure strategy. What is less 
obvious is the sufficiency (if part) of the proposition. Intuitively, if the density 
over the performance measure induced by a is distinct from the density induced 
by any other strategy, then the performance measure is informative with respect 
to determining whether a was the agent's strategy or whether he played a dif- 
ferent strategy. Because the range of u (•) is unbounded, even a small amount 
of information can be exploited to implement a by rewarding the agent for per- 
formance that is relatively more likely to occur when he plays the strategy a, or 
by punishing him for performance that is relatively unlikely to occur when he 
plays the strategy a, or both. Of course, even if there are other strategies that 
induce the same density as a, a is still implementable if the agent finds these 
other strategies more costly than a. 



Technical Aside 

We can formalize this notion of informationally distinct as follows: The condition 
that no strategy duplicate the density over performance measures induced by a is 
equivalent to saying that there is no density (strategy) (ai, . . . , aj-i) over the other 
J — 1 elements of A such that 

j-i 

f ( a ) = a i i M- 

Mathematically, that's equivalent to saying that f(a) is not a convex combination 
of {f (ci)} a e^\{a}; or , equivalently that f(a) is not in the convex hull of {f(o)|o ^ 
a}. See Hermalin and Katz (1991) for more on this "convex-hull" condition and its 
interpretation. Finally, from Proposition 5, the condition that f(o) not be in the 
convex hull of {f(a)|o ^ a} is sufficient for a to be implementable. 

Before solving the principal's problem (Step 1, page 33), it's worth consid- 
ering, and then dismissing, two "pathological" cases. The first is the ability 
to implement least- cost actions at their full-information cost. The second is 
the ability to implement any action at its full-information cost when there is a 
shifting support (of the right kind). 

Definition 4 An action a is a least-cost action if a e arg min aG ^ c (a) . That 
is, a is a least-cost action if the agent's disutility from choosing any other action 
is at least as great as his disutility from choosing a. 

Proposition 6 If a is a least-cost action, then it is implementable at its full- 
information cost. 



Proof: Consider the fixed-payment contract that pays the agent u n = Ur + c (a) for 
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all n. This contract clearly satisfies (IR) and, because c (a) < c (a) for all a € A, it also 
satisfies (IC). The cost of this contract to the principal is u^ 1 [Ur + c(a)] — C F (a), 
the full-information cost. ■ 



Of course, there is nothing surprising to Proposition 6: When the principal 
wishes to implement a least-cost action, her interests and the agent's are per- 
fectly aligned; that is, there is no agency problem. Consequently, it is not 
surprising that the full-information outcome obtains. 

Definition 5 There is a meaningful shifting support associated with action a 
if there exists a subset of X, Xq, such that F (Xo\a) > — F {Xq\o) for all 
actions a such that c(a) < c(a). 

Proposition 7 Let there be a meaningful shifting support associated with action 
a. Then action a is implementable at its full-information cost. 



Proof: Fix some arbitrarily small e > and define up — u(s + e). Consider the 
contract u that sets u m — u P if x m 6 X (where X is defined above) and that sets 
u n = Ur + c (a) if x n Xq. It follows that f (2) T u = Ur + c(a); that f (a) T u — > — oo 
as e | for a such that c (a) < c (a); and that 

f (a) T u — c (a) < Ur, + c (a) — c (a) < f (a) T u — c (a) 

for a such that c (a) > c(a). Consequently, this contract satisfies (IR) and (IC). 
Moreover, the equilibrium cost of this contract to the principal is w" 1 [Ur + c (o)], the 
full-information cost. ■ 



Intuitively, when there is a meaningful shifting support, observing an x G Xq 
is proof that the agent took an action other than a. Because the principal has 
this proof, she can punish the agent as severely as she wishes when such an x 
appears (in particular, she doesn't have to worry about how this punishment 
changes the risk faced by the agent, given the agent is never in jeopardy of 
suffering this punishment if he takes the desired action, o). 30 Moreover, such a 
draconian punishment will deter the agent from taking an action that induces 
a positive probability of suffering the punishment. In effect, such actions have 
been dropped from the original game, leaving a as a least-cost action of the new 
game. It follows, then, from Proposition 6, that a can be implemented at its 
full- information cost. 

30 It is worth noting that this argument relies on being able to punish the agent sufficiently 
in the case of an i £ Xq. Whether the use of such punishments is really feasible could, 
in some contexts, rely on assumptions that are overly strong. First, that the agent hasn't 
(or can contractually waive) protections against severe punishments. For example, in the 
English common-law tradition, this is generally not true; moreover, courts in these countries 
are generally loath to enforce contractual clauses that are deemed to be penalties. Second, 
that the agent has faith in his understanding of the distributions (i.e., he is sure that taking 
action a guarantees that an x £ Xq won't occur). Third, that the agent has faith in his own 
rationality; that is, in particular, he is sufficiently confident that won't make a mistake (i.e., 
choose an a such that F (Xq\o) > 0). 
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It is worth noting that the full-information benchmark is just a special case 
of Proposition 7, in which the support of f (a) lies on a separate plane, X' x {a}, 
for each action a. 

Typically, it is assumed that the action the principal wishes to implement 
is neither a least-cost action, nor has a meaningful shifting support associated 
with it. Henceforth, we will assume that the action that principal wishes to 
implement, a, is not a least-cost action (i.e., 3a G A such that c(a) < c(a)). 
Moreover, we will rule out all shifting supports by assuming that /„ (a) > for 
all n and all a. 

We now consider the whether there is a solution to Step 1 when there is no 
shifting support and the action to be implemented is not a least-cost action. 
That is, we ask the question: If a is implementable is there an optimal contract 
for implementing it? We divide the analysis into two cases: u (•) affine (risk 
neutral) and u (•) strictly concave (risk averse). 

Proposition 8 Assume u(-) is affine and that a is implementable. Then a is 
implementable at its full-information cost. 



Proof: Let u solve (IR) and (IC). From Proposition 4, we may assume that (IR) is 
binding. Then, because it(-) and, thus, (■) are affine: 



Note, given that we can't do better than implement an action at full-information 
cost, this proposition also tells us that, with a risk-neutral agent, an optimal 
contract exists for inducing any implementable action. The hidden-action prob- 
lem (the lack of full information) is potentially costly to the principal for two 
reasons. First, it may mean a desired action is not implementable. Second, even 
if it is implementable, it may be implementable at a higher cost. Proposition 
8 tells us that this second source of cost must be due solely to the agent's risk 
aversion; an insight consistent with those derived earlier. 

In fact, if we're willing to assume that the principal's benefit is alienable — 
that is, she can sell the rights to receive it to the agent — and that the agent 
is risk neutral, then we can implement the optimal full-information action, a* 
(i.e., the solution to Step 2 under full information) at full-information cost. In 
other words, we can achieve the complete full-information solution in this case: 

Proposition 9 (Selling the store) Assume that u(-) is affine and that the 
principal's benefit is alienable. Then the principal can achieve the same expected 
utility with a hidden-action problem as she could under full information. 



Proof: Under full information, the principal would induce a* where 




where the last inequality follows from the fact that (IR) is binding. 



a* G arg max B (a) — C F (a) . 



ai 
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Define 

t* =B{a)~C F {a). 

Suppose the principal offers to sell the right to her benefit to the agent for t* . If the 
agent accepts, then the principal will enjoy the same utility she would have enjoyed 
under full information. Will the agent accept? Note that because u (•) is affine, there 
is no loss of generality in assuming it is the identity function. If he accepts, he faces 
the problem 

max / (6 — £*) dG (b\a) — c (o) . 
aeA J B 

This is equivalent to 

maxB (a) — c (a) — B (a*) + c (a*) + Ur; or to 

aeA 

max B(a) - [c (a) + Ur] - B (a) + c(a) + 2Ur. 

a£A 

Because B (a) — [c (a) + Ur] — B (a) — C F (a) , rational play by the agent conditional 
on accepting means his utility will be Ur; which also means he'll accept. ■ 



People often dismiss the case where the agent is risk neutral by claiming that 
there is no agency problem because the principal could "sell the store (produc- 
tive asset)" to the agent. As this last proposition makes clear, such a conclusion 
relies critically on the ability to literally sell the asset; that is, if the principal's 
benefit is not alienable, then this conclusion might not hold. 31 In other words, 
it is not solely the agent's risk aversion that causes problems with a hidden 
action. 

Corollary 1 Assume that u (•) is affine and that the principal's benefit equals 
the performance measure (i.e., B = X and G (-|a) = F (-|a) ). Then the principal 
can achieve the same expected utility with a hidden-action problem as she could 
under full information. 



Proof: Left to the reader. [Hint: Let s (x) = x — t, where t is a constant.] ■ 

Now we turn our attention to the case where u(-) is strictly concave (the 
agent is risk averse). Observe (i) this entails that u" 1 (•) is strictly convex; (ii), 
because S is an open interval, that u(-) is continuous; and (iii) that (•) is 
continuous. 

Proposition 10 Assume that u(-) is strictly concave. If a is implementable, 
then there exists a unique contract that implements a at minimum expected cost. 

31 To see this, suppose the benefit is unalienable. Assume, too, that A = {1/4,1/2,3/4}, 
X = {1, 2}, c (a) = y/a, h ( a ) = a > Ur = 0, and B (a) = 4-4 (a - |) 2 . Then it is readily seen 
that a* = 1/2. However, from Proposition 5, a* is not implementable, so the full-information 
outcome is unobtainable when the action is hidden (even though the agent is risk neutral). 
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Proof: Existence. Define 



(u) = /„ (a) u 1 (u„) ■ 



(27) 



The strict convexity and continuity of u~ (•) implies that fi (•) is also a strictly convex 
and continuous function. Observe that the principal's problem is to choose u to 
minimize fi (u) subject to (IR) and (IC). Let U be the set of contracts that satisfy 
(IR) and (IC) (by assumption, U is not empty). Were U closed and bounded, then a 
solution to the principal's problem would certainly exist because Q (•) is a continuous 
real-valued function. 33 Unfortunately, U is not bounded (although it is closed given 
that all the inequalities in (IR) and (IC) are weak inequalities). Fortunately, we can 
artificially bound U by showing that any solution outside some bound is inferior to a 
solution inside the bound. Consider any contract u° £ U and consider the contract 
u*, where = Ur + c(a). Let U IR be the set of contracts that satisfy (IR). Note 
that U C U IR . Note, too, that both U and U IR are convex sets. Because Q (•) has a 
minimum on U IR , namely u*, the set 



is closed, bounded, and convex. By construction, U n V is non-empty; moreover, for 
any u 1 G U n V and any u 2 £ U\V, 0. (u 2 ) > n (u 1 ). Consequently, nothing is lost 
be limiting the search for an optimal contract to U PI V. The set U n V is closed and 
bounded and Q (•) is continuous, hence it follows that an optimal contract must exist. 

Uniqueness. Suppose the optimal contract, u, were not unique. That is, there 
exists another contract u such that n (u) = n (u) (where n (•) is defined by (27)). It is 
readily seen that if these two contracts each satisfy both the (IR) and (IC) constraints, 
then any convex combination of them must as well (i.e., both are elements of U, which 
is convex). That is, the contract 



A £ (0, 1) must be feasible (i.e., satisfy (IR) and (IC)). Since O(-) is strictly convex, 
Jensen's inequality implies 



But this contradicts the optimality of u. By contradiction, uniqueness is established. ■ 



Having concluded that a solution to Step 1 exists, we can — at last — calculate 
what it is. From Proposition 8, the problem is trivial if u (•) is affinc, so we will 
consider only the case in which u (•) is strictly concave. The principal's problem 

32 The existence portion of this proof is somewhat involved mathematically and can be 
omitted without affecting later comprehension of the material. 

33 This is a well-known result from analysis (see, e.g., Fleming, 1977, page 49). 

34 The convexity of V follows because SI (•) is a convex function and U IR is a convex set. 
That V is closed follows given U IR is also closed. To see that V is bounded, recognize that, as 
one "moves away" from u* — while staying in U IR — CI (u) increases. Because fl (•) is convex, 
any such movement away from u* must eventually (i.e., for finite u) lead to a Q (u) > Q (u°) 
(convex functions are unbounded above). Hence V is bounded. 




Ua = Au + (1 — A) u, 



n (u A ) < An (u) + (i - a) n (u) = n (u) . 
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is a standard nonlinear programming problem: Minimize a convex function 
(i- e -' ^2n=i fn (a) ( u n)) subject to J constraints (one individual rationality 
constraint and J— 1 incentive compatibility constraints, one for each action other 
than a). If we further assume, as we do henceforth, that u(-) is diffcrentiable, 
then the standard Lagrangc-multiplicr techniques can be employed. Specifically, 
let A be the Lagrange multiplier on the IR constraint and let fij be the Lagrange 
multiplier on the IC constraint between a and cij, where j = 1, . . . , J — 1 indexes 
the elements of A other than a. It is readily seen that the first-order condition 
with respect to the contract are 



We've already seen (Proposition 4) that the IR constraint binds, hence A > 0. 
Because a is not a least-cost action and there is no shifting support, it is readily 
shown that at least one IC constraint binds (i.e., 3j such that fj,j > 0). It's 
convenient to rewrite the first-order condition as 



Note the resemblance between (28) and (9) in Section 2.4. The difference is that, 
now, we have more than one Lagrange multiplier on the actions (since we now 
have more than two actions) . In particular, we can give a similar interpretation 
to the likelihood ratios, /„ (aj) //„ (a), that we had in that earlier section; with 
the caveat that we now must consider more than one action. 

4.4 Properties of the optimal contract 

Having solved for the optimal contract, we can now examine its properties. In 
particular, we will consider three questions: 

1. Under what conditions does the expected cost of implementing an action 
under the optimal contract for the hidden-action problem exceed the full- 
information cost of implementing that action? 

2. Recall the performance measures, x, constitute distinct elements of a 
chain. Under what conditions is the agent's compensation increasing with 
the value of the signal (i.e., when does x -< x' imply s (x) < s (a/))? 

3. Consider two principal-agent models that are identical except that the 
information structure (i.e., {f (a) \a € A}) in one is more informative than 
the information structure in the other. How do the costs of implementing 
actions vary between these two models. 

The answer to the first question is given by 




j-i 




(28) 
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Proposition 11 Consider a hidden-information problem. Assume there is no 
shifting support (i.e., /„ (a) > for all n and all a). Assume, too, that u(-) is 
strictly concave. If a is not a least-cost action, then it cannot be implemented 
at its full-information cost. 



Proof: If a is not implementable, then the result is obvious; hence, we'll assume a is 
implementable. Define U IR to be the set of all contracts that satisfy the IR constraint 
for a. Let u* be the contract in which it* = Ur + c(a) for all n. Note u* G U IR . 
Finally define, 

JV 

Q (u) = /„ (a) u' 1 («„) . 

71=1 

Because u(-) is strictly concave, the principal's expected cost if the agent chooses a 
under contract u, (u), is a strictly convex function of u. By Jensen's inequality and 
the fact that there is no shifting support, fi(-), therefore, has a unique minimum in 
U IR , namely u*. Clearly, Q(u*) = C F (a). The result, then, follows if we can show 
that u* is not incentive compatible. Given that a is not a least-cost action, there exists 
an a such that c (a) > c (a) . But 

f (a) T u* — c (a) = Ur + c (a) — c (a) > Ur = f (a) T u* — c (a) ; 

that is, u* is not incentive compatible. ■ 



Note the elements that go into this proposition if a is implementable: There 
must be an agency problem — mis-alignment of interests (i.e., a is not least cost); 
there must, in fact, be a significant hidden-action problem (i.e., no shifting 
support); and the agent must be risk averse. We saw earlier that without any one 
of these elements, an implementable action is implementable at full-information 
cost (Propositions 6-8); that is, each element is individually necessary for cost 
to increase when we go from full information to a hidden action. This last 
proposition shows, inter alia, that they are collectively sufficient for the cost to 
increase. 

Next we turn to the second question. We already know from our analysis of 
the two-action model that the assumptions we have so far made are insufficient 
for us to conclude that compensation will be monotonic. From our analysis 
of that model, we might expect that we need some monotone likelihood ratio 
property. In particular, we assume 

MLRP: Assume there is no shifting support. Then the monotone likelihood 
ratio property is said to hold if, for any a and a in A, c (a) < c (a) implies that 
fn ( a ) I fn ( a ) is nonincreasing in n. 

Intuitively, MLRP is the condition that actions that the agent finds more costly 
be more likely to produce better performance. 

Unlike the two-action case, however, MLRP is not sufficient for us to obtain 
monotone compensation (see Grossman and Hart, 1983, for an example in which 
MLRP is satisfied but compensation is non-monotone). We need an additional 
assumption: 
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CDFP: The agency problem satisfies the concavity of distribution function 
property if, for any a, a', and a in A, 

c{a) = Ac (a) + (1 - A)c(a') 3Ae(0,l) 

implies that F (-|a) first-order stochastically dominates XF (-|a)+(l — A) F (-|&')-' 

Another way to state the CDFP is that the distribution over performance is 
better — more likely to produce high signals — if the agent plays a pure strategy 
than it is if he plays any mixed strategy over two actions when that mixed 
strategy has the same expected disutility as the pure strategy. 
We can now answer the second question: 

Proposition 12 Assume there is no shifting support, that u(-) is strictly con- 
cave and differentiable, and that MLRP and CDFP are met. Then the optimal 
contract given the hidden-action problem satisfies Si < • • • < sjv- 



Proof: Let a be the action the principal wishes to implement. If a is a least-cost 
action, then the result follows from Proposition 6; hence assume that a is not a least- 
cost action. Let A = {a\c(a) < c(a)}; that is, A is the set of actions that cause 
the agent less disutility than a. Consider the principal's problem of implementing a 
under the assumption that the space of contracts is A'. By MLRP, /„ (a) /f„ (a) is 
nonincreasing in n for all a € .A', so it follows from (28) that si < • • • < Sn under 
the optimal contract for this restricted problem. The result then follows if we can 
show that this contract remains optimal when we expand A to A — adding actions 
cannot reduce the cost of implementing a, hence we are done if we can show that the 
optimal contract for the restricted problem is incentive compatible in the unrestricted 
problem. That is, if there is no a, c (a) > c(a), such that 

f (a) T u-c(a) > f (a) T u-c(a), (29) 

where u = (u (si) , . . . , u (sjv)) T . As demonstrated in the proof of Proposition 11, the 
incentive compatibility constraint between a and at least one a G A, c(a') < c(a), is 
binding; i.e., 

f (a') T u-c(a') = f(a) T u-c(a). (30) 

Because c (a) € (c (a') , c (a)), there exists a A G (0, 1) such that c (a) = (1 — A) c (a') + 
Ac (a). Using CDFP and the fact that u (si) < • • • < u (s N ), we have 

i(a) T u-c(a) > (1- A)f (a') T u + Af (a) T u-c(a) 

= (1-A) [f (a') T u-c(a')] + A [f (a) T u - c (a)] . 

But this and (30) are inconsistent with (29); that is, (29) cannot hold, as was re- 
quired. ■ 



Lastly, we come to question 3. An information structure for a principal- 
agent problem is F = {f (a) |o € ^4}. A principal-agent problem can, then, be 
summarized as *P = (A, X,F, B (•) , c(-) , u (•) , Ur). 

35 Rccall that distribution G (■) first-order stochastically dominates distribution H (•) if 
G{z) < H (z) for all z and strictly less than for some z. 
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Proposition 13 Consider two principal- agent problems that are identical ex- 
cept for their information structures (i.e., consider 

V 1 = (A,X,F\B(.),c(-),u(-),U R ) 

and 

^ = (A,X,F 2 ,B(-),c(-),u(-),U R )). 

Suppose there exists a stochastic transformation matrix Q (i.e., a garbling,) 36 
such that f 2 (a) = Qf 1 (a) for all a G A, where F (a) denotes an element of 
F l . Then, for all a e A, the principal 's expected cost of optimally implementing 
action a in the first principal-agent problem, Vp 1 , is not greater than her expected 
cost of optimally implementing a in the second principal- agent problem, *p 2 . 



Proof: Fix a. If a is not implementable in <p 2 , then the result follows immediately. 
Suppose, then, that a is implementable in <p 2 and let u 2 be the optimal contract for 
implementing a in that problem. Consider the contract u 1 = Q T u 2 . We will show 
that u 1 implements a in VfS 1 . Because 

f (a) u = f (a) Q u = f (a) u , 

the fact that u 2 satisfies IR and IC in ?p 2 can readily be shown to imply that u 1 
satisfies IR and IC in *p x . The principal's cost of optimally implementing a in is no 
greater than her cost of implementing a in *p x using u 1 . By construction, u^ = q^,u 2 , 
where q.„ is the nth column of Q. Because — u^ 1 (u^) and u^ 1 (■) is convex, it 
follows from Jensen's Inequality that 



< 



E 



(recall q. n is a probability vector). Consequently, 



N 



E f™ M S ™ - E W E 1™" S ™ = E S ™- 



n=l 

The result follows. 



Proposition 13 states that if two principal-agent problems are the same, 
except that they have different information structures, where the information 
structure of the first problem is more informative than the information structure 
of the second problem (in the sense of Blackwell's theorem), then the principal's 
expected cost of optimally implementing any action is no greater in the first 
problem than in the second problem. By strengthening the assumptions slightly, 
we can, in fact, conclude that the principal's expected cost is strictly less in the 
first problem. In other words, making the signal more informative about the 
agent's action makes the principal better off. This is consistent with our earlier 

36 A stochastic transformation matrix, sometimes referred to as a garbling, is a matrix in 
which each column is a probability density (i.e., has non-negative elements that sum to one). 
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findings that (i) the value of the performance measures is solely their statistical 
properties as correlates of the agent's action; and (ii) the better correlates — 
technically, the more informative — they are, the lower the cost of the hidden- 
action problem. 

It is worth observing that Proposition 13 implies that the optimal incentive 
scheme never entails paying the agent with lotteries over money (i.e., randomly 
mapping the realized performance levels via weights Q into payments). 



4.5 A continuous performance measure 

Suppose that X were a real interval — which, without loss of generality, we can 
take to be R — rather than a discrete space and suppose, too, that F (x\a) were 
a continuous and differentiable function with corresponding probability density 
function f(x\a). How would this change our analysis? By one measure, the 
answer is not much. Only three of our proofs rely on the assumption that X is 
finite; namely the proofs of Propositions 5, 10 (and, there, only the existence 
part), 37 and 13. Moreover, the last of the three can fairly readily be extended to 
the continuous case. Admittedly, it is troubling not to have general conditions 
for implementability and existence of an optimal contract, but in many specific 
situations we can, nevertheless, determine the optimal contract. 38 

With X = R, the principal's problem — the equivalent of (19) — becomes 

/•OO 



mm 

u ( x ) J-oo 



/oo 
u~ x [u (x)] f (x\a) dx 
-oo 

subject to 

/oo 
u (x) f (x\a) dx — c (a) > Ur; and 
-c 



oo 
oo 



u (x) f (x\a) dx — c (a) > / u (x) f (x\a) dx — c (a) Va e A. 

We know the problem is trivial if there is a shifting support, so assume the 
support of x, supp{x}, is invariant with respect to a. 39 Assuming an optimal 

37 Where our existence proof "falls down" when X is continuous is that our proof relies on 
the fact that a continuous function from M. N — » K has a minimum on a closed and bounded set. 
But, here, the contract space is no longer a subset of M. N , but rather the space of all functions 
from X — > R; and there is no general result guaranteeing the existence of a minimum in this 
case. 

38 Page (1987) considers conditions for existence in this case (actually he also allows for 
A to be a continuous space). Most of the assumptions arc technical, but not likely to be 
considered controversial. Arguably a problematic assumption in Page is that the space of 
possible contracts is constrained; that is, assumptions are imposed on an endogenous feature 
of the model, the contracts. In particular, if & is the space of permitted contracts, then there 
exist L and Mel such that L < s(x) < M for all s (•) G 6 and all x e X. Moreover, 
& is closed under the topology of pointwise convergence. On the other hand, it could be 
argued that range of real-life contracts must be bounded: Legal and other constraints on 
what payments the parties can make effectively limit the space of contracts to some set of 
bounded functions. 

39 That is, 

{x|/(x|a) > 0} = {x|/(x|a') > 0} 



40 



Continuous Action Space 



contract exists to implement a, that contract must satisfy the modified Borch 
sharing rule: 

1 

u>[u-i(u(x))] =A + § 
Observe that this is just a variation on (9) or (28). 
4.6 Bibliographic notes 

Much of the analysis in this section has been drawn from Grossman and Hart 
(1983). In particular, they deserve credit for Propositions 4, 8, and 10-13 
(although, here and there, we've made slight modifications to the statements or 
proofs). Proposition 5 is based on Hcrmalin and Katz (1991). The rest of the 
analysis represent well-known results. 



/Oh) 

f(x\a)_ 



for almost every x € supp {x} . 



5 Continuous Action Space 

So far, we've limited attention to finite action spaces. Realistic though this may 
be, it can serve to limit the tractability of many models, particularly when we 
need to assume the action space is large. A large action space can be prob- 
lematic for two, related, reasons. First, under the two-step approach, we are 
obligated to solve for the optimal contract for each a G A (or at least each 
a e A 1 ) then, letting C (a) be the expected cost of inducing action a under its 
corresponding optimal contact, we next maximize B (o) — C (a) — expected ben- 
efit net expected cost. If A is large, then this is clearly a time-consuming and 
potentially impractical method for solving the principal-agent problem. The 
second reason a large action space can be impractical is because it can mean 
many constraints in the optimization program involved with finding the optimal 
contract for a given action (recall, e.g., that we had J — 1 constraints — one for 
each action other than the given action). Again, this raises issues about the 
practicality of solving the problem. 

These problems suggest that we would like a technique that allows us to 
solve program (15) on page 30, 

max / W(S(x),x,b)dG(b,x\a) (31) 

subject to 

aeargmax/ U(S(x),x,a')dF(x\a') (32) 
a ' Jx 

and 



max / U{S{x),x,a')dF(x\a') > U R , 
a ' Jx 



for all a and a' in A. 



47 



Continuous Action Space 



directly, in a one-step procedure. Generally, to make such a maximization pro- 
gram tractable, we would take A to be a compact and continuous space (e.g., a 
closed interval on R), and employ standard programming techniques. A number 
of complications arise, however, if we take such an approach. 

Most of these complications have to do with how we treat the IC constraint, 
expression (32). To make life simpler, suppose that A = [a, a] C M, X = M, 
that F (-\a) is diffcrentiable and, moreover, that the expression in (32) is itself 
differentiable for all a £ A. Then, a natural approach would be to observe that 
if a G (a, a) maximizes that expression, it must necessarily be the solution to 
the first-order condition to (32): 



(where subscripts denote partial derivatives). Conversely, if we knew that the 
second-order condition was also met, (33) would be equivalent to (32) and we 
could use it instead of (32) — at least locally. Unhappily, we don't, in general, 
know (i) that the second-order conditions is met and (ii) that, even if it is, the a 
solving (33) is a global rather than merely local maximum. For many modeling 
problems in economics, we would avoid these headaches by simply assuming 
that (32) is globally strictly concave in a, which would ensure both the second- 
order condition and the fact that an a solving (33) is a global maximum. We 
can't, however, do that here: The concavity of (32) will, in general, depend 
on S (x); but since S (•) is endogenous, we can't make assumptions about it. 
If, then, we want to substitute (33) for (32), we need to look for other ways 
to ensure that (33) describes a global maximum. Not surprisingly, these ways 
are, in general, complicated and we direct the reader interested in a "general" 
approach to consider Rogerson (1985) and Jewitt (1988). 

An additional complication arises with whether (31) also satisfies the prop- 
erties that would allow us to conclude from first-order conditions that a global 
maximum has, indeed, been reached. Fortunately, in many problems, this issue 
is less severe because we typically impose the functional form 



which gives the problem sufficient structure to allow us to validate a "first-order 
approach." 

In the rest of this section, we develop a simple model in which a first-order 
approach is valid. 

5.1 The first-order approach with a spanning condition 

Assume, henceforth, that A = [a, a] C M, X = [x,x] C M, and that F (-\a) is 
differentiable. Let / (-|a) be the associated probability density function for each 
a G A. We further assume that 




(33) 



W (S (x) , x, b) 



x — S (x) , 



1. U (S (x) , x, a) = u [S (x)} - a; 
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2. u(-) is strictly increasing and strictly concave; 

3. the domain of u (•) is (s, oo), lining u (s) = — oo, and lim^oo u (s) = oo; 

4. f (x\a) > for all x £ X and for all a e A (i.e., there is no shifting 
support); 

5. F (x\a) = 7 (a) F ff (x) + (1 - 7 (a)) F L (or) and / (z|a) = 7 (a) f H (x) + 
(1 — 7 (a)) Jl (x) for all x and a, where 7 : A — > [0,1] and Fxx (•) and 
Fx, (•) are distribution functions on X; 

6. 7(-) is strictly increasing, strictly concave, and twice differentiable; and 

7. Il (x) I fn (x) satisfies the MLRP (i.e., /x (x) / fn (x) is non-increasing in 
x and there exist x' and x" in X, x' < x" , such that fx (#') //i? (a;') > 

fL(x")/fH&')). 

Observe Assumptions 5~7 allow us, inter alia, to assume that c (a) = a without 
loss of generality. Assumption 5 is known as a spanning condition. 
In what follows, the following result will be critical: 

Lemma 2 F H (•) dominates F L (•) m the sense of first-order stochastic domi- 
nance. 



Proof: We need to show Fh {x) < fx (x) for all a; € X and strictly less for some x. 
To this end, define 



A(z)= f Z [f„(x)-f L (x)]dx. 

J X 



We wish to show that A (z) < for all z G and strictly less for some z. Observe 
that 



A( Z )= r 

■J X 



fn(x) 



f H {x) dx. 



Let 

(a:) 

By MLRP, 5 (•) is non-decreasing everywhere and increases at one x at least. Because 
A (x) — 0, Jh (•) > 0, x — x > 0, and 5 (■) is not constant, it follows that <5 (•) must 
be negative on some sub-interval of X and positive on some other. Because S (•) is 
non-decreasing, there must, therefore, exist an x £ (x, x) such that 

5 (x) < for all x < x (and strictly less for x < x < x < x, for some x); and 
$ (x) > for all x > x (and strictly greater for x > x > x" > x, for some x"). 

For z < x, this implies that A (z) < — it is the integral of a quantity that is negative 
over some range of the integral and never positive anywhere on that range. Finally, 
consider, z G (x,x). A' (z) = S (z) ju (z) > for all z in that interval. Hence, because 
A (x) < and A (x) = 0, it must be that A (z) < for all z € (x, x). We've just shown 
that A (2) < for all z G X and strictly less for some z, which yields the result. ■ 



SPANNING 
CONDITION 
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A consequence of this Lemma is that if </>(•) is an increasing function, then 

[ </>(x)[f H (x)-f L (x)]dx>0. (34) 

Jx 

It follows, then, that if S (•) (and, thus, u[S'(a;)]) is increasing, then (32) is 
globally concave in a. To see this, observe 

f U{S{x),x,a)dF{x\a) = f u [S (x)] [7 {a) f H (x) + (1 - 7 (<*)) h (x)] dx - a; 

J X Jx 



IX 

so we have 



d_ 
da 



f U(S(x),x,a)dF(x\a)= f u [S (x)} [f H (x) - f L (x)} 7' (a) dx > 

JX Jx 



IX 

by (34) and the assumption that 7' (•) > 0. Moreover, 
d 2 



da 2 



f U(S(x),x, a)dF(x\a) = f u [S (x)} [f H (x) - f L (x)] 7" (a) dx < 

Jx Jx 



by (34) and the assumption that 7" (•) < 0. To summarize: 

Corollary 2 If S (•) is increasing, then the agent's choice-of-action problem is 
globally concave. That is, we 're free to substitute 



f 

J X 



u [S (x)] [f H (x) - f L (x)} 7' (a) dx = (35) 



for (32). 



We'll now proceed as follows. We'll suppose that S (•) is increasing and we'll 
solve the principal's problem. Of course, when we're done, we'll have to double 
check that our solution indeed yields an increasing S (•). It will, but if it didn't, 
then our approach would be invalid. The principal's problem is 

max / \x — S (x)] f (x\a) dx 

subject to (35) and the IR constraint, 

u [S (x)] f (x\a) dx — a> Ur. 



f 

J X 



As we've shown many times now, this last constraint must be binding; so we 
have a classic constrained optimization program. Letting A be the Lagrange 
multiplier on the IR constraint and letting /j, be the Lagrange multiplier on 
(35), we obtain the first-order conditions: 

-/ (x\a) + fiu' [S (x)] [f H (x) - f L (x)} 7' (a) + Xu' [S (x)} f (x\a) = 
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differentiating by S (x); and 

( X [x-S{x)\ [f H {x)-f L {x)\i (a)dx 

J x 

+^ f u [S (x)} [f H (x) - f L (x)} 7" (a) dx (36) 

J x_ 

= 

differentiating by a (there's no A expression in the second condition because, 
by (35), the derivative of the IR constraint with respect to a is zero). We can 
rearrange the first condition into our familiar modified Borch sharing rule: 

1 x ^ [f H {x)-f L {x)]i{a) 

— A -\- [I- 



u> [S (x)] " 7 (a) f H (x) + (1 - 7 (a)) f L (x) 

A| [l-r(x)]j(a) 

7 (a) [1 - r (x)] +r(x)' 

where r (x) = Jl (x) / Jh (x). Recall that 1/u' [•] is an increasing function; hence, 
to test whether S (•) is indeed increasing, we need to sec whether the right-hand 
side is decreasing in r (x) since r (•) is decreasing. Straightforward calculations 
reveal that the derivative of the right-hand side is 



(r(i) + (l-r(i)) 7 (o)) 2 

We've therefore shown that S (■) is indeed increasing as required; that is, our 
use of (35) for (32) was valid. 

Observe, from (36), that, because the agent's second-order condition is met, 
the first line in (36) must be positive; that is, 



J X 



[x-S(x)} [f H (x) - f L (x)} dx > 0. 



But this implies that, for this S (■), the principal's problem is globally concave 



in a: 

d?_ 
da 2 



[ [x-S(x)]f(x\a)dx= [ [x-S(x))[f H (x)-f L (x)}y'(a)dx<0. 

J x J X 



Moreover, for any S(-), the principal's problem is (trivially) concave in S(-). 
Hence, we may conclude that the first-order approach is, indeed, valid for this 
problem. 

Admittedly, the spanning condition is a fairly stringent condition; although 
it does have an economic interpretation. Suppose there are two distributions 
from which the performance measure could be drawn, "favorable" (i.e., Fh (•)) 
and "unfavorable" (i.e., Fl (•)). The harder — higher a — the agent chooses, the 
greater the probability, 7 (a) , that the performance measure will be drawn from 
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the favorable distribution. For instance, suppose there are two types of potential 
customers, those who tend to buy a lot — the H type — and those who tend not 
to buy much — the L type. By investing more effort, a, in learning his territory, 
the salesperson (agent) increases the probability that he will sell to H types 
rather than L types. 

5.2 Bibliographic notes 

The first papers to use the first-order approach were Holmstrom (1979) and 
Shavell (1979). Grossman and Hart (1983) was, in large part, a response to the 
potential invalidity of the first-order approach. Our analysis under the spanning 
condition draws, in part, from Hart and Holmstrom (1987). 
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